Some Fun Data on Regional Economic Activity

Intro + Background

This is going to be a quick look at two kind of cool data sets that seem really interesting and useful for evaluating the economic impacts of Covid. I haven’t quite figured out exactly how to best use them in this capacity yet…but I wanted to give you all a quick look at them. The data I’m talking about are:

California Taxable Sales from the California Department of Tax and Fee Administration
Paycheck Protection Program data from the U.S. Small Business Administration

I plan to keep this post pretty simple. Mainly, I just want to explore these data sets a little. I’m going to filter out a lot of the complexity and just focus in on what the two data sets have to say about the Retail and Food/Beverage Service Sector of the Santa Cruz County (CA) local economy. Here’s a quick outline for the rest of this post:

First, I collect the PPP data for all businesses operating in Santa Cruz County, CA
I collect the California Taxable Sales by City data.
I filter the PPP data to include businesses defined by NAICs Codes associated with Retail and Food Service Establishments
I filter the California Taxable Sales data to include just observations from Santa Cruz County
Then, I plot the California Taxable Sales data for Santa Cruz County from 2009 - 2020 which provides a quick and dirty look at how Covid-19 impacted Retail and Food Services Transactions in Santa Cruz County
Finally, I plot the PPP data for Retail and Food Services businesses. This gives a quick look at how much “bailout money” was given to Retail and Food Services firm in Santa Cruz County. This is a rough approximation of how much business lost because of the Covid-19 pandemic was “replaced” by government money.

Dependencies

Here are the R libraries that this module depends on:

library(dplyr)
library(ggplot2)
library(DT)
library(ggthemes)
library(zoo)
library(scales)
library(lubridate)
library(formattable)
library(kableExtra)

There are also some data dependencies but I will discuss those in the next section.

Data

Source Data

I have two primary sources of data for this Vignette:

About the Data

These data are pretty interesting and folks interested in using them for things more serious than recreational data mining should do their own due diligence. Here’s a quick summary:

The PPP program was created to help businesses retain and pay workers during the Covid-19 pandemic. It was a form of economic stimulus that was offered to businesses on the condition that at least 75% of the payment amount go directly to payroll. The PPP data available from the SBA contain around 50 different fields that provide information on the nature of these payments. Each row in each .csv is a payment. The columns contain information like name and address of the business, amount of PPP money received, and other helpful stuff (like NAICS Code for each recipient).
The California Taxable Sales Data that I am using are reported quarterly for each major city in California. The values of most interest are reported in two columns:

Retail and Food Services Transactions, and
All Outlets Transactions

Data Access

The PPP data are available from the SBA as a series of .csv files. I have downloaded these, saved them locally, and created some R functions to:

read each .csv into the workspace
bind the data objects together into a single data frame
clean the data frame a little (mostly standardizing string fields for convenience)

I’ll try to crank out a follow-up post to this one with more reproducible content on working with these PPP data.

# this part is a little cumbersome...in my PPP-EIDL Project I've created a linked set of functions to make this look nice...
#   but it's hard to run these from within my GitHub Blog so we're going old-skool here:
  public150plus <- read.csv("C:/Users/aaron.mamula/Desktop/R-Projects/PPP-EIDL-Analysis/source-data/02-01-21 Paycheck Protection Program Data/public_150k_plus.csv",
                            stringsAsFactors = F)
  public_up_to_150k_1 <- read.csv("C:/Users/aaron.mamula/Desktop/R-Projects/PPP-EIDL-Analysis/source-data/02-01-21 Paycheck Protection Program Data/public_up_to_150k_1.csv",
                                  stringsAsFactors = F)
  public_up_to_150k_2 <- read.csv("C:/Users/aaron.mamula/Desktop/R-Projects/PPP-EIDL-Analysis/source-data/02-01-21 Paycheck Protection Program Data/public_up_to_150k_2.csv",
                                  stringsAsFactors = F)
  public_up_to_150k_3<- read.csv("C:/Users/aaron.mamula/Desktop/R-Projects/PPP-EIDL-Analysis/source-data/02-01-21 Paycheck Protection Program Data/public_up_to_150k_3.csv",
                                 stringsAsFactors = F)
  public_up_to_150k_4 <- read.csv("C:/Users/aaron.mamula/Desktop/R-Projects/PPP-EIDL-Analysis/source-data/02-01-21 Paycheck Protection Program Data/public_up_to_150k_4.csv",
                                  stringsAsFactors = F)
  public_up_to_150k_5 <- read.csv("C:/Users/aaron.mamula/Desktop/R-Projects/PPP-EIDL-Analysis/source-data/02-01-21 Paycheck Protection Program Data/public_up_to_150k_5.csv",
                                  stringsAsFactors = F)
  public_up_to_150k_6 <- read.csv("C:/Users/aaron.mamula/Desktop/R-Projects/PPP-EIDL-Analysis/source-data/02-01-21 Paycheck Protection Program Data/public_up_to_150k_6.csv",
                                  stringsAsFactors = F)
  
  
  ppp.df <- rbind(public150plus,
                  public_up_to_150k_1,
                  public_up_to_150k_2,
                  public_up_to_150k_3,
                  public_up_to_150k_4,
                  public_up_to_150k_5,
                  public_up_to_150k_6)

These data are quite large (mainly because they are very wide) so I’m going to filter them to include just Santa Cruz County for this illustration:

# the PPP data have a BorrowerCounty field and a ProjectCountyName field. These two values are generally the same but 
# can be different. I'm going to argue that, for this application, the ProjectCountyName is the more important of the two 
#  because this should be a better indicator of where the money was distributed. 

ppp.df <- ppp.df %>% filter(ProjectCountyName=="SANTA CRUZ" & BorrowerState=="CA")

The California Taxable Sales Data by City are accessible through an API maintained by the California Department of Tax and Fee Information. The GET url can be parameterized in a number of ways but for my current purposes I just want all available data:

It’s worth noting here that the object ca.taxable is a list object. It is the JSON object that is returned from my API request, which contains some meta-data about the API request itself. I extract the values of interest from this list object by collecting the values list element.

ca.taxable <- jsonlite::fromJSON("https://cdtfa.ca.gov/dataportal/api/odata/Taxable_Sales_by_City?%24count=true")
ca.taxable <- ca.taxable$value

Data Exploration

I don’t want to munge around with either of these data sets too much here. They are pretty rich and one could probably do a lot of cool stuff with them. Here, I just want to give everyone a peek at the data with some emphasis on stuff that’s relevant for the illustration at hand:

PPP Data

Here’s a quick look at some of the more interesting fields in the PPP Data for Santa Cruz County, CA:

knitr::kable(ppp.df %>% ungroup() %>% filter(row_number()<=10) %>%
                select(BorrowerName,BorrowerAddress,BorrowerCity,BorrowerZip,InitialApprovalAmount,JobsReported,FranchiseName,
                       ServicingLenderName, NAICSCode) %>% arrange(-JobsReported)) %>%
       kable_styling(full_width=F)

BorrowerName	BorrowerAddress	BorrowerCity	BorrowerZip	InitialApprovalAmount	JobsReported	ServicingLenderName	NAICSCode
THRESHOLD ENTERPRISES LTD	23 Janis Way	Scotts Valley	95066-3506	5439600	500	PNC Bank, National Association	325411
AMERI KLEEN	119 BEACH ST	WATSONVILLE	95076-4557	3490200	500	1st Capital Bank	561720
ALTA VISTA FARMS LP	136 MARSH LN	WATSONVILLE	95076-2224	3680400	481	1st Capital Bank	111334
ENCOMPASS COMMUNITY SERVICES	380 Encinal St. Suite 200	SANTA CRUZ	95060-2101	4786545	450	Santa Cruz County Bank	621112
SALUD PARA LA GENTE	195 Aviation Way, Suite 200	WATSONVILLE	95076-2053	5218785	366	Santa Cruz County Bank	624190
BAY PHOTO, INC.	920 DISC DR	SCOTTS VALLEY	95066-4544	3599477	355	Santa Cruz Community CU	323111
S. MARTINELLI & COMPANY	735 Beach Street	WATSONVILLE	95076	3894618	326	Santa Cruz County Bank	311411
SANTA CRUZ SEASIDE COMPANY	400 Beach Street	Santa Cruz	95060-5416	6236226	243	Santa Cruz County Bank	713110
UNIVERSAL AUDIO, INC	4585 Scotts Valley Drive	SCOTTS VALLEY	95066-4517	4088800	201	Santa Cruz County Bank	334310
NEW TEACHER CENTER	1205 Pacific Avenue, Suite 301	SANTA CRUZ	95060-3914	3071357	133	Santa Cruz County Bank	923130

CA Taxable Sales

Next, let’s have a quick look at the Taxable Sales Data.

ca.taxable$RetailandFoodServicesTaxableTransactions <- currency(ca.taxable$RetailandFoodServicesTaxableTransactions,digits=0L)
knitr::kable(ca.taxable %>% filter(County %in% c("Santa Cruz","SANTA CRUZ")) %>% filter(row_number() <= 10)) %>% 
   kable_styling(full_width=F)

CalendarYear	Quarter	QuarterMonthFrom	QuarterMonthTo	County	City	RetailandFoodServicesTaxableTransactions	TotalAllOutletsTaxableTransactions	DisclosureFlag	PerCapita
2009	A	1	12	Santa Cruz	Capitola	$322,595,000	355427000	NA	NA
2009	A	1	12	Santa Cruz	Santa Cruz	$589,761,000	718859000	NA	NA
2009	A	1	12	Santa Cruz	Santa Cruz County	$1,956,754,000	2638469000	NA	NA
2009	A	1	12	Santa Cruz	Scotts Valley	$117,995,000	147933000	NA	NA
2009	A	1	12	Santa Cruz	Watsonville	$381,538,000	495137000	NA	NA
2009	Q1	1	3	Santa Cruz	Capitola	$69,397,000	76375000	NA	NA
2009	Q1	1	3	Santa Cruz	Santa Cruz	$128,250,000	156001000	NA	NA
2009	Q1	1	3	Santa Cruz	Santa Cruz County	$426,714,000	584519000	NA	NA
2009	Q1	1	3	Santa Cruz	Scotts Valley	$24,822,000	32275000	NA	NA
2009	Q1	1	3	Santa Cruz	Watsonville	$84,602,000	110666000	NA	NA

So there are two kind of weird things about the taxable sales data:

The data are mostly quarterly…but they have an annual summary row as well which is identified by Quarter = A.
The geographic unit of resolution is a little funky. There are observations for each major city in Santa Cruz County but there are also observations where the City field = “SANTA CRUZ COUNTY”.

These issues are pretty minor and easily dealt with.

Food Service Analysis - County

For starters, I’m going to isolate a set of NAICS Codes with certian 4-digit prefixes.

7223, 7224, 7225 are (as far as I can tell) are the main NAICS prefixes associated with Food Service Establisments, Restaurants, and Bars.
4451, 4452, 4453, and 4471 are grocery and food stores, beer/wine/liquor stores, and gas stations
4481, 4482, 4483 - clothing stores, jewelry stores, shoe stores
4511 - Sporting Goods
4512 - Book Stores
4522 _ Dept Stores
4531 - Florists
4539 - Miscellaneous stores like pet stores, art dealers, and tobacco stores

I can’t say for sure if these are all the NAICS Codes that should match up with the receipts in the RetailandFoodServicesTaxableTransactions column from the ca.taxable data frame, but it seems like a good approximation at least.

# extract the first 4 digits for each NAICS Code
retail <- ppp.df %>% ungroup() %>% mutate(NAICS4dig=as.numeric(substring(as.character(NAICSCode),1,4))) %>%
                filter(NAICS4dig %in% c(7223,7224,7225,4451,4452,4453,4471,4481,4482,4483,4511,4512,4522,4531,4539))
retail$InitialApprovalAmount <- currency(retail$InitialApprovalAmount,digits=0L)
knitr::kable(retail %>% select(BorrowerName,BorrowerCity,InitialApprovalAmount,JobsReported,ProjectCountyName) %>%
           arrange(-InitialApprovalAmount) %>% filter(row_number() <= 20)) %>% kable_styling(full_width=F)

BorrowerName	BorrowerCity	InitialApprovalAmount	JobsReported	ProjectCountyName
RICHARD MICHAEL HARRISON	SOQUEL	$9,112,277	31	SANTA CRUZ
J.J.’S SALOON AND SOCIAL CLUB, LLC	SOQUEL	$2,165,312	7	SANTA CRUZ
SEA EAGLE LP.	Santa Cruz	$1,717,410	164	SANTA CRUZ
CAPITOLA GAYLE’S, INC.	CAPITOLA	$1,345,000	80	SANTA CRUZ
SEA EAGLE, LP	SANTA CRUZ	$1,160,382	202	SANTA CRUZ
DELUXE FOODS OF APTOS, INC.	APTOS	$903,187	79	SANTA CRUZ
CULINARY ENTERPRISES, INC.	CAPITOLA	$885,855	136	SANTA CRUZ
WHITING’S FOOD CONCESSIONS INC	Santa Cruz	$860,850	100	SANTA CRUZ
FLEW THE COOP INC	LA SELVA BEACH	$816,400	194	SANTA CRUZ
STAGNARO BROTHERS SEAFOOD INC	Santa Cruz	$738,467	103	SANTA CRUZ
WILD ROOTS MARKET, INC	BEN LOMOND	$731,370	110	SANTA CRUZ
MAS MAC INC	Watsonville	$710,540	210	SANTA CRUZ
SUN SHOPS	SANTA CRUZ	$645,470	45	SANTA CRUZ
LILLIAN’S ITALIAN KITCHEN INCORPORATED	Santa Cruz	$631,000	44	SANTA CRUZ
WEST END TAP & KITCHEN LP	Santa Cruz	$597,900	88	SANTA CRUZ
HULA’S SANTA CRUZ LLC	Santa Cruz	$574,651	53	SANTA CRUZ
SHOPPER’S CORNER INC	SANTA CRUZ	$549,434	81	SANTA CRUZ
WHITING’S FOOD CONCESSIONS, INC	SANTA CRUZ	$545,000	148	SANTA CRUZ
EL PALOMAR, INC	SANTA CRUZ	$526,565	82	SANTA CRUZ
J & J CORP.	Santa Cruz	$501,900	70	SANTA CRUZ

Next, I’m going create a really simple visual of the time-series of taxable sales in Santa Cruz County for just the Retail and Food Services Transactions:

# first I need to create a date from the quarterly observations...
# I'm also going to rename these really long columns
sc.taxable <- ca.taxable %>% filter(Quarter!="A") %>% 
                  mutate(County=toupper(County),City=toupper(City)) %>%
                filter(County=="SANTA CRUZ") %>% 
                mutate(date=as.Date(paste(CalendarYear,"-",QuarterMonthFrom,"-","1",sep=""),format="%Y-%m-%d")) %>%
                mutate(Q=as.yearqtr(paste(CalendarYear,Quarter,sep="")),
                       RetailFood=RetailandFoodServicesTaxableTransactions) %>%
                select(-RetailandFoodServicesTaxableTransactions)
               

ggplot(subset(sc.taxable,City=="SANTA CRUZ COUNTY"),aes(x=Q,y=RetailFood/1000000)) +
  geom_point() + geom_line() + theme_bw() + ylab("$s (millions)") + xlab("") + scale_x_yearqtr(format = "%YQ%q") +
  theme(axis.text.x=element_text(angle=45)) +
  ggtitle("Santa Cruz County Retail & Food Service Transactions")

Even if all you know about Santa Cruz County, California is that it’s pretty close to the ocean, it’s probably not shocking to you to see pronounced seasonality in the quarterly Retail and Food Service Transaction Data series. I’ll admit that I was somewhat surprised to see a pretty linear upward trend in Taxable Sales in Santa Cruz that’s been at play for the last 10 years.

If one just “eye-balls” the data, it looks like there is often a pretty big jump in Retail and Food Service transactions between Q1 and Q2, then progressively smaller increases between the remaining quarters. I’m not certain on the timeline of events but I kinda think the most restrictive elements of the County-wide shelter in place order (non-essential businesses needing to close) went into effect around March of 2020. So it’s not super surprising to see the Q2 2020 value pretty far below the Q2 values for other recent years

In fact, since 2009, Q2 Retail and Food Service Transactions in Santa Cruz County have been growing by around 3-4% per year (compared with same quarter previous year). 2020 is the first year since 2009 that Q2 Retail and Food Service Transactions declined (again compared to same quarter previous year).

# filter for Q2 values and clean up labels for display
q3 <- sc.taxable %>% filter(Quarter=="Q2" & City=="SANTA CRUZ COUNTY") %>% select(date,Q,County,City,RetailFood) %>%
       arrange(date) %>%
         mutate(PCT_CNG=(RetailFood-lag(RetailFood))/lag(RetailFood),
                year=year(date),
                quarter="Q2") %>% select(-date,-Q,-City)

q3$RetailFood <- currency(q3$RetailFood,digits=0L)
q3$PCT_CNG <- percent(q3$PCT_CNG,digits=2)
knitr::kable(q3, caption="Quarter 2 Taxable Sales and Percent Change from Previous Quarter in the Retail and Food/Beverage Sector")   %>% kable_styling(full_width=F)

Quarter 2 Taxable Sales and Percent Change from Previous Quarter in the Retail and Food/Beverage Sector
County	RetailFood	PCT_CNG	year	quarter
SANTA CRUZ	$489,492,000	NA	2009	Q2
SANTA CRUZ	$521,519,000	6.54%	2010	Q2
SANTA CRUZ	$567,292,000	8.78%	2011	Q2
SANTA CRUZ	$594,107,000	4.73%	2012	Q2
SANTA CRUZ	$639,874,000	7.70%	2013	Q2
SANTA CRUZ	$669,644,368	4.65%	2014	Q2
SANTA CRUZ	$681,535,151	1.78%	2015	Q2
SANTA CRUZ	$697,889,702	2.40%	2016	Q2
SANTA CRUZ	$735,575,775	5.40%	2017	Q2
SANTA CRUZ	$755,090,015	2.65%	2018	Q2
SANTA CRUZ	$782,691,045	3.66%	2019	Q2
SANTA CRUZ	$672,220,452	-14.11%	2020	Q2

The next step in this very simple “data story” is to aggregate the PPP funding received by Retail and Food Service Entities in Santa Cruz County.

retail.agg <- retail %>% group_by(NAICS4dig) %>% summarise(Jobs=sum(JobsReported,na.rm=T),
                                             Dollars=sum(InitialApprovalAmount,na.rm=T)) %>%
                arrange(-Dollars)

## `summarise()` ungrouping output (override with `.groups` argument)

# This next part is pretty cumbersome...but it's gonna make the plot look way nicer
retail.agg <- retail.agg %>% 
                mutate(BusType=ifelse(NAICS4dig %in% c(7225,7224,7223),"Food Service/Bars",
                                  ifelse(NAICS4dig %in% c(4451,4452,4453),"Grocery/Food/Liquor Store",
                                  ifelse(NAICS4dig %in% c(4481,4482,4483,4512,4531),"Retail (Clothes,Books,Shoes,etc)","Other"))))

ggplot(retail.agg,aes(x=BusType,y=Dollars/1000000)) + geom_bar(stat="identity") + 
  theme(axis.text.x=element_text(angle=45)) + ylab("$ (millions)") + xlab("") + theme_tufte()

So here are some related observations that we can make from a quick look at these two data set:

Quarterly Taxable Sales for the Retail and Food/Beverage Services Sector in Santa Cruz County were down about 11% in Q2 of 2020 when compared against Q2 of 2019
In dollar terms this was a reduction of about $110 million
PPP payments to Santa Cruz County businesses in this sector totaled approximately $72.4 million.

These observations are not the result of careful analysis but rather quick and dirty data mining. They are not made for the purposes of supporting any conclusions about the “economic impacts of Covid on Santa Cruz County’s Retail and Food Services Sector.” They are offered as illustrations of the type of information that can be extracted from joining PPP data with California Taxable Sales Data. I think it’s not too hard to see how, with a little additional rigor, these data sets could generate some important insights about the economic/fiscal impacts of Covid-19 on local economies.

Bonus Content

The California Taxable Sales Data are available at the city level. In the prior sections I just used the county aggregate but it might be fun for some people to separate out the city level contributions. Here’s a quick picture of how county level taxable sales for Santa Cruz County are distributed among the various cities:

#aggretate taxable sales by year and city within Santa Cruz County...also reorder factor levels
#     so that the plot looks nicer
cities.annual <- sc.taxable %>% filter(City!="SANTA CRUZ COUNTY") %>% group_by(City,CalendarYear) %>%
              summarise(RetailFood=sum(RetailFood)) %>% arrange(CalendarYear,-RetailFood) %>%
               mutate(City=factor(City, levels=c("SCOTTS VALLEY","CAPITOLA","WATSONVILLE","SANTA CRUZ")))

## `summarise()` regrouping output by 'City' (override with `.groups` argument)

ggplot(cities.annual,aes(x=CalendarYear,y=RetailFood/1000000,fill=City)) + geom_bar(stat="identity") + 
         theme_bw() + scale_fill_viridis_d() + xlab("") + ylab("$s (millions)") + 
            ggtitle("Retail and Food Service Taxable Sales in Santa Cruz County, CA")

We can then aggregate the PPP data by the ProjectCity field to get a city-level picture of PPP money in the Retail and Food Service Sector:

# use the "retail" data frame (which was derived from the source PPP data) and aggregate by city
retail.city.agg <- retail %>% mutate(ProjectCity=toupper(ProjectCity)) %>%
            group_by(ProjectCity) %>% summarise(InitialApprovalAmount=sum(InitialApprovalAmount,na.rm=T)) %>%
             arrange(-InitialApprovalAmount)

## `summarise()` ungrouping output (override with `.groups` argument)

#order factor levels
retail.city.agg <- retail.city.agg %>% mutate(ProjectCity=factor(ProjectCity,unique(retail.city.agg$ProjectCity)))

ggplot(retail.city.agg,aes(x=ProjectCity,y=InitialApprovalAmount/1000000)) + geom_bar(stat="identity") + 
      theme_bw() +
      theme(axis.text.x=element_text(angle=90)) + ylab("$s (millions)") + xlab("")

The first thing to notice here is that there are bunch of “cities” in the PPP data that don’t exist in the California Taxable Sales Data. I’m not going to do anything about that right now…but, again, if one were interested in using these data for more serious purposes, one would need to decide how to map cities consistently between the two data sets.

My preliminary hypothesis is that taxable sales receipts activity in places like Aptos, Ben Lomond, and Corralitos (places that are not specifically represented in the CA Taxable Sales Data) are probably embedded in the County Aggregate. That is, I’m guessing that the Santa Cruz County Aggregate in the Taxable Sales Data is larger than the sum of Santa Cruz, Watsonville, Capitola, and Scotts Valley..and the delta between those two things is probably the activity in these smaller geographies that are not listed as cities in the taxable sales data set.

That’s all I got for right now. Hit me up if you want these data and don’t want to mess with the SBA’s FTP site…I can drop them in a google drive or something.