How can we predict quarterly revenue for retail companies, leveraging our knowledge of such companies?
More specifically…
Consider time dimensions
What matters:
Last quarter?
Last year?
Other time frames?
Cyclicality
Time and OLS
Time matters a lot for retail
How to capture time effects?
Autoregression: our main focus today
Regress \(y_t\) on earlier value(s) of itself
Last quarter, last year, etc.
Controlling for time directly in the model
We can do this using something called a fixed effect
This absorbs the noise in a model due to a particular grouping in the data
Noise in data
Statistical noise is random error in the data
Many sources of noise:
Factors we neglected to included in the model
Error in measurement
Accounting measurement!
Unexpected events / shocks
Groups within the data that behave differently
Noise is OK, but the more we remove, the better!
Removing noise: Revisiting SG real estate revenue
Different companies may behave slightly differently
Control for this using a Fixed Effect (ISIN uniquely identifies companies)
forecast3.1<-lm(revt_lead~revt+act+che+lct+dp+ebit+factor(isin), data=df_clean[df_clean$fic=="SGP",])# n=15 to prevent outputting every fixed effectlibrary(broom)print(tidy(forecast3.1), n=15)
Fixed effects are used when the average of \(\hat{y}\) varies by some group in our data
In our problem, the average revenue of each firm is different
Fixed effects absorb this difference
Further reading:
Introductory Econometrics by Jeffrey M. Wooldridge
Data preparation
The data
From quarterly reports
Two sets of firms:
US “Hypermarkets & Super Centers” (GICS: 30101040)
US “Multiline Retail” (GICS: 255030)
Data from Compustat - Capital IQ > North America - Daily > Fundamentals Quarterly
Formalization
Question
How can we predict quarterly revenue for large retail companies?
Hypothesis (just the alternative ones)
Current quarter revenue helps predict next quarter revenue
3 quarters ago revenue helps predict next quarter revenue (Year-over-year)
Different quarters exhibit different patterns (seasonality)
A long-run autoregressive model helps predict next quarter revenue
Prediction
Use OLS for all the above, \(t\)-tests for coefficients
Hold out sample: 2018-2023
Variable generation
library(tidyverse)# As alwayslibrary(plotly)# interactive graphslibrary(lubridate)# import some sensible date functions# Generate quarter over quarter growth "revtq_gr"df<-df%>%group_by(gvkey)%>%mutate(revtq_gr=revtq/lag(revtq)-1)%>%ungroup()# Generate year-over-year growth "revtq_yoy"df<-df%>%group_by(gvkey)%>%mutate(revtq_yoy=revtq/lag(revtq, 4)-1)%>%ungroup()# Generate first difference "revtq_d"df<-df%>%group_by(gvkey)%>%mutate(revtq_d=revtq-lag(revtq))%>%ungroup()# Generate a proper date# Date was YYYY-MM-DD, can be converted from text to date easilydf$date<-ymd(df$datadate)# From lubridatedf$qtr<-quarter(df$date)# From lubridate
Use mutate for variables using lags
ymd() from lubridate is handy to convert any date listing year, then month, then day.
# Clean the data: Replace NaN, Inf, and -Inf with NAdf<-df%>%mutate(across(where(is.numeric), ~replace(., !is.finite(.), NA)))# Split into training and testing data# Training data: We'll use data released before 2018train<-filter(df, year(date)<2018)# Testing data: We'll use data released 2018 through 2023test<-filter(df, year(date)>=2018)
group= to set groups to list in the legend. Not needed if using the below though
color= to set color by some grouping variable. Put factor() around the variable if you want discrete groups, otherwise it will do a color scale (light to dark)
# Calculating various in and out of sample statisticsmodels<-list(mod1,mod2,mod3)model_names<-c("1 quarter", "1 and 4 quarters", "8 quarter")df_test<-data.frame(adj_r_sq=sapply(models, function(x)summary(x)[["adj.r.squared"]]), rmse_in=sapply(models, function(x)rmse(train$revtq, predict(x,train))), mae_in=sapply(models, function(x)mae(train$revtq, predict(x,train))), rmse_out=sapply(models, function(x)rmse(test$revtq, predict(x,test))), mae_out=sapply(models, function(x)mae(test$revtq, predict(x,test))))rownames(df_test)<-model_nameshtml_df(df_test)# Custom function using knitr and kableExtra