Slides.knit

With Franco Wong

November 2022

Dr. Richard M. Crowley
rcrowley@smu.edu.sg
Slides: https://rmc.link/Rotman $\cdot$ @prof_rmc

With Franco Wong

November 2022

Dr. Richard M. Crowley
rcrowley@smu.edu.sg
Slides: https://rmc.link/Rotman $\cdot$ @prof_rmc

Motivation

What is sentiment?

More explicitly: What does text sentiment measure in a financial context?
More pointedly: What is the theoretical construct underlying sentiment?
- This is a tricky question:
  - Sentiment from a dictionary is applied in a bag-of-words fashion
  - Most words in financial sentiment are difficult to interpret in isolation

Sentiment words’ underlying context is the key to answering the above questions.

Motivation

What is sentiment?

More explicitly: What does text sentiment measure in a financial context?
More pointedly: What is the theoretical construct underlying sentiment?
- This is a tricky question:
  - Sentiment from a dictionary is applied in a bag-of-words fashion
  - Most words in financial sentiment are difficult to interpret the meaning of in isolation

Sentiment words’ underlying context is the key to answering the above questions.

What do we mean by context?

Cambridge dictionary: “The text or speech that comes immediately before and after a particular phrase or piece of text and helps to explain its meaning”

What is the surrounding text around the word of interest?
- E.g., is the text about accounting policies, operating performance, contracting, etc.

A motivating example

If asked to answer what a single sentiment word means from a document, a logical approach is to examine the text it came from

Word: Loss, LM sentiment: Negative

Actual extractions from 10-K MD&As:
1. “[…]net loss was due to $4.7 million goodwill write off[…]”
  - Makes sense
2. “[…]loss ratio decreased to 43% as result of segment’s adherence to underwriting guidelines to claims[…]”
  - Opposite…
3. “[…]functional coffees dedicated to weight loss[…]”
  - Unrelated…
4. “[…]loan loss provision totaled $4.6 million in 2008 compared[…]”
  - Loan loss provision is a frequent mishit

A methodological contribution of our study is asking this en masse to derive context

What we do

Method
1. Identify any clause of text that appears to contain content (from a linguistic perspective)
2. Identify all contexts in the text using an unsupervised approach
3. Assign each identified clause to a context
Empirics
1. How does dictionary sentiment depends on context?
2. Do prior results using financial sentiment hold across contexts?
  - How stable is the context dependency of sentiment across DVs?
3. Replicate results using other sentiment approaches

Why? To understand what financial sentiment captures and if it is empirically consistent

Main results

Only a few key contexts drive each financial sentiment result
- Aggregation to document-level sentiment adds a lot of noise
Sentiment, at the context level, often contradicts prior results
- Aggregation removes nuance from our understanding
Different contexts drive prediction for different DVs
- Sentiment captures different empirical constructs in different regressions
The above results hold across 2 other financial sentiment dictionaries
- Our results are not unique to the LM dictionary
The above results hold using a neural network-based sentiment measure
- Bag-of-words isn’t the problem – financial sentiment, as a construct, likely is

Punchline: Sentiment should be measured on fine-grained contexts, not full documents

In other words, a precise matching between the text used and the economic question examined is needed

The idea

Our goal is to replicate a natural approach that one would take to identify contexts by hand:
1. Take a reference clause
2. Look to see what the clause is about (the “context”)
3. Assign the clauses into logical groupings of contexts
4. After: Interpret sentiment of a clause within context

In order to better understand context and its link to sentiment, we will examine a broad set of contexts spanning all MD&A content

Illustration of extracting context

“The company’s earnings increased by 5% due to an improvement in operating efficiency.”

(company; has; earnings)
(company’s earnings; increased by; 5%)
(company’s earnings; increased due; improved operating efficiency)
(company’s earnings; increased due; operating efficiency)

Illustration of extracting context

“The company’s earnings increased by 5% due to an improvement in operating efficiency.”

(company; has; earnings)
(company’s earnings; increased by; 5%)
(company’s earnings; increased due; improved operating efficiency)
(company’s earnings; increased due; operating efficiency)

Extracting context

Automating with Stanford Open IE

Open IE is an open information extraction algorithm
Generates triples of context of the form (subject; relation verb; object)
Multi-step algorithm:
1. Creates the dependency parse tree
2. Resolves any co-references (“it,” “her,” etc.)
3. Determines clause boundaries (multinomial logistic model)
4. Determines triples within each clause (linguistic patterns)

This nets 179,703,756 extractions

Cutting this down a bit

Some are superfluous as we saw earlier.
- More likely to happen with longer sentences
Approach: Keep the shortest extractions such that…
1. We cover as much of the sentence as possible without having nested extractions
2. We don’t drop words from LM
3. We don’t drop accounting content

This cuts out 73% $\Rightarrow$ still have 48,576,229 extractions

“Accounting content”

Harvey’s hypertextual finance glossary
“The largest financial glossary on the Internet”
Some words unique to this dictionary:
1. demonetization
2. boilerplate
3. deductible

NYSSCPA’s Accounting Terminology Guide
“Over 1,000 Accounting and Finance Terms”
Some words unique to this dictionary:
1. GASB
2. MD&A
3. periodicity

Some shared words: collateral, specialist, hedge, debit, inventory

We preprocess the above to make them machine-usable

Drop any words with fewer than 4 characters, e.g., “A”
Remove stopwords from the dictionary
Restrict to only single words (to match LM structure), manually review unmatched phrases
Inflect words using the python word_forms library
- E.g., given “periodicity,” also match “periodicities” and “periodic”

Matching extractions together

Two step approach:
1. Map all extractions to a 512 dimension vector space that represents underlying meaning using Universal Sentence Encoder (USE; Cer et al. 2018)
  - We mask out certain tokens that USE tends to focus on too much
    - Dates, times, dollar amounts, percentages, quantities, and ordinals
2. Cluster within the 512-dim vector space with Mini-Batch K-means (Sculley 2010)
  - Optimize this with Tibshirani et al. 2001’s Gap statistic

How does USE work?

Input: Words mapped to embeddings, word order
Processing: Transformer-based neural network (uses “attention”)
Output: A sentence embedding of 512 dimensions

USE abstracts away from word choice!

How does Mini-Batch K-means work?

Input: USE vectors
Algorithm: K-means (in principle)
Output: Cluster centers and convergence info (Inertia)

Why Mini-Batch?

Mini-Batch K-means is an online version of K-means
- Online means we can iteratively add new data to update the model
I.e., instead of feeding 48+M observations at once, we can do 1M at a time
- Much more memory efficient
  - K-means is a one shot algorithm, and thus processes all data at once
    - It would need multiple 48M x 512 matrices – ~300GB in total

How does the Gap statistic work?

Let…
- $k$ be the number of clusters,
- $B$ the number of simulated samples
- $W_k$ be the K-Means inertia score on actual data
- $W_{k,r}^*$ be the K-Means inertia score for iteration $r$ with synthetic data
- $\bar{l}$ be the average of the $W_{k,r}^*$s

\[ \begin{aligned} Gap(k) &= \left(\frac{1}{B}\right) \sum_{r=1}^{B} \text{log}\left(W_{k,r}^*\right)-\text{log}\left(W_k\right) \text{and}\\ s_k &= sd_k \sqrt{1+\frac{1}{B}},\text{ where }sd_k = \sqrt{\left(\frac{1}{B}\right)\sum_{r=1}^{B}\left\{\text{log}\left(W_{k,r}^*-\bar{l}\right)\right\}^2} \end{aligned} \]

Select the lowest $k$ such that $Gap(k) \ge Gap(k+1) - s_{k+1}$

I.e., select the lowest $k$ s.t. the log-scaled error removed by clustering on real data at $k$ is no worse than 1 SD below the log-scaled error removed at $k+1$

Gap simulation

13 is the lowest $k$ s.t. the log-scaled error removed by clustering on real data at $k$ is no worse than 1 SD below the log-scaled error removed at $k+1$ (red circle)

Caveat from Tibshirani et al. (2001): the initial optimal $k$ may be too small in more varied text; we use the first $k$ that is optimal compared to $k+1$ and $k+2$: 137 (blue circle)

Accounting

Accounting policies
- Accounting assumptions
- Revenue Recognition
- Tax
Accounting standards
- Accounting standards
- New accounting standard
General and Balance sheet discussion
- Cash flows
- Deferred tax
Income statement discussion
- Accounting losses
- Depreciation and amortization

Business operations

Debt, Equity, and Investment
- Financing
- Loans
Expectations and future
- Management expectations
- Risk factor disclosures
Macroeconomics
- Interest rates
- Market risk
Operations
- Growth
Structure
- Subsidiaries

Changes

Changes in sales
Changes in expenses
Changes in operating measures
Declines in value or performance
Increase in expenses
Inreases in income or revenue

Ungrouped

Grammatical patterns
- Company information with name
- Modal weak statements
Timeframes
- Dates
- Reporting periods
Unrelated statements
- Unrelated statements 1-6
Unrelated statements with specific words
- “Company” + unrelated statements 1-4

Validation work

Overlap of original extractions with accounting dictionaries:
- 95.2% contain at least 1 word in the Campbell Harvey’s dictionary
- 84.8% contain at least 1 word in the NYSSCPA dictionary
Intrusion task
- Take 3 clauses from 1 context and an “intruder” from another
- Can you tell which is the intruder?
  - E.g.:
    1. average market rate is in effect
    2. price swings are due to commodity costs
    3. net sales impact is in same store sales
    4. Volatility is in commodity prices
- 4 RAs average 86% on the task; 500 questions each
Regress MD&A sentiment on clusters conditional on sentiment
- 82.3% (68.6%) of variation captured for negative (positive) sentiment

Data

All 10-K and 10-K405 MD&A sections to build the text model
- 107,596 MD&As
- 48,576,229 extractions
Only MD&As subject to many requirements for empirical tests
- 35,362 MD&As
- 22,669,186 extractions
Loughran McDonald sentiment from their 10X File summaries file
MD&A LM sentiment based on the 10-K parser from Brown, Crowley and Elliott (2020) (BCE)
- The BCE parser has Pearson correlations $>80\%$ for full text sentiment measures with LM
Accounting data from Compustat
Stock data from CRSP
Material weaknesses from Audit Analytics

Empirics sketch

Three regression structures used throughout

To examine how sentiment relates to context
- $Sentiment_{f,t} = \alpha + \sum_{i=1}^{137}\beta_i Context_{i,f,t} + \gamma \cdot Controls_{f,t} + \delta \cdot Industry~FE+\varepsilon$
  - Run using a LASSO regression
To replicate results from Loughran McDonald (2011)
- $DV_{f,t} = \alpha + \beta_0 Sentiment_{f,t} + \gamma \cdot Controls_{f,t} + \delta \cdot Industry~FE+\varepsilon$
  - Run using a linear regression
To partition the replication on context
- $DV_{f,t} = \alpha + \sum_{i=1}^{137}\beta_i Sentiment_{Context,i,f,t} + \gamma \cdot Controls_{f,t} + \delta \cdot Industry~FE+\varepsilon$
  - Run using a LASSO regression

LASSO regression

Least Absolute Shrinkage and Selection Operator

LASSO replaces the standard OLS minimization problem of $\min_{\beta,\gamma,\delta} \frac{1}{N} |\varepsilon|_2^2$ with:

\[ \min_{\beta,\gamma,\delta} \frac{1}{N} |\varepsilon|_2^2 + \lambda \sum_{b\in \{\beta, \gamma, \delta\}} \left|b\right|_1 \]

$\lambda$ is a penalty term that needs to be optimized
- We do this with 10-fold cross-validation
LASSO is the same as adding an $L^1$ penalty to a regression!
- Standard technique for dealing with overly high VIFs

Derive p-values using Post-LASSO estimator (Belloni and Chernozhukov 2013)

What about causal links?

The drawback of handling multicollinearity is removing variables that are potentially causally important
- This can lead to questions on the validity of inferences derived from LASSO-based coefficients

Solution: Double LASSO (Belloni, Chernozhukov and Hansen 2014 JEP)

\[ \begin{align} (1)&\qquad DV_{f,t} = \alpha + \sum_{i=1}^{137}\beta_i Sentiment_{Context,i,f,t} + \gamma \cdot Controls_{f,t} + \delta \cdot Industry~FE+\varepsilon\\ (2)&\qquad Sentiment_{Context,i,f,t} = \alpha + \sum_{j\ne i}\beta_j Sentiment_{Context,j,f,t} + [...]\\ (3)&\qquad DV_{f,t} = \alpha + \sum_{i\in S}\beta_i Sentiment_{Context,i,f,t} + \gamma \cdot Controls_{f,t} + \delta \cdot Industry~FE+\varepsilon\\ & \qquad S = [1,137]\backslash\{i~\text{ s.t. } 0 < \sum_{\beta_i\text{ from }\{(1), (2)\}} I(\beta_i \ne 0)\} \end{align} \]

Run 138 LASSO regressions to determine significant links between outcome or IVs and IVs
Run a post OLS keeping only variables that had significant impact on the LASSO regressions

Sentiment regressed on context

92 (79) contexts drive negative (positive) sentiment

Some uniformly drive both positive and negative sentiment
- E.g., “cautionary statements” and “reduction in accounts”
Some only drive negative sentiment
- E.g., “accounting losses” and “risk factor disclosures”
Some only drive positive sentiment
- E.g., “increases in performance” and “tax”
Some drive a lack of sentiment
- E.g., “depreciation and amortization” and “credit facilities”

As more coefficients’ signs match to our intuition for negative sentiment, we argue that negative sentiment is more tied to context.

Filing period excess return

Prediction: Positive relation between sentiment and return

Expected signs: Negative for negative sentiment, positive for positive sentiment
Replication: Expected result for negative sentiment, null result for positive sentiment
Contexts: Mixed findings, both sentiments drive results in both directions
Double LASSO: Results are consistent

Filing period abnormal volume

Prediction: More sentiment (either), higher volume

Expected signs: Positive for both
Replication: Opposite result for negative sentiment, null result for positive sentiment
Contexts: Mixed findings, but mostly in line with predictions
Double LASSO: Results are consistent

Post-filing return volatility

Prediction: More sentiment (either), higher volatility

Expected signs: Positive for both
Replication: Expected result for negative sentiment, null result for positive sentiment
Contexts: Mixed findings, both sentiments drive results in both directions
Double LASSO: Results are consistent

Future material weakness

Prediction: Inverse relation between sentiment and Material weaknesses

Expected signs: Positive for negative sentiment, negative for positive sentiment
Replication: Null result for negative sentiment, expected result for positive sentiment
Contexts: Mixed findings, both sentiments drive results in both directions
Double LASSO: Results are consistent

Falsification test

Randomly assign each clause to one of 137 groups using a uniform distribution

Table 4 replication
- Context drives 31.5% of the variation in negative sentiment
- Context drives 10.8% of the variation in positive sentiment
Tables 5 through 8 replication
- All falsification tests have fewer significant coefficients on the context measures than our main results
- All falsification tests have lower adjusted $R^2$ than our main results

Our main results are unlikely to be driven by disaggregation in general

Context is meaningful for sentiment

Is sentiment a consistent construct? It doesn’t appear to be.

Negative sentiment:
- No context always loads
- “Discussion of accounting procedures” and “decreases in expenses or performance” load 3/4 of the time
- 13 contexts significant only twice
- 35 contexts significant only once

Positive sentiment:
- No context always loads
- “decrease’ + unrelated statements” loads 3/4 of the time
- 4 contexts significant only twice
- 43 contexts significant only once

This appears to violate how we approach sentiment empirically

Aggregation is likely a problem

What do we do?
1. We build a measure of context within annual report MD&As
2. We regress sentiment on context
3. We replicate sentiment results by partitioning sentiment by context
What do we find?
1. Sentiment relies more on some contexts than others
2. Context matters for when regressing on sentiment
  - Some contexts behave as expected for sentiment, many others do not!
3. The regression DV matters
  - Sentiment results are driven by different contexts for different DVs

Takeaways

Sentiment, at the document level, is not a consistent construct.
Sentiment should be analyzed on fine-grained contexts.
- At a level where there is a clear economic link

Thanks!

Richard M. Crowley
Singapore Management University
https://rmc.link/
@prof_rmc

Packages used for these slides

dplyr
ggplot
gridExtra
kableExtra
knitr
revealjs

Understanding Sentiment Through Financial Context

November 2022

Dr. Richard M. Crowleyrcrowley@smu.edu.sgSlides: https://rmc.link/Rotman \(\cdot\) @prof_rmc

Understanding Sentiment Through Financial Context

November 2022

Dr. Richard M. Crowleyrcrowley@smu.edu.sgSlides: https://rmc.link/Rotman \(\cdot\) @prof_rmc

Motivation

Motivation

Motivation

What do we mean by context?

A motivating example

What we do

Main results

Related literature

Measuring context

The idea

Illustration of extracting context

Illustration of extracting context

Extracting context

Cutting this down a bit

“Accounting content”

Matching extractions together

How does USE work?

How does Mini-Batch K-means work?

How does the Gap statistic work?

Gap simulation

Context examples

Context examples

Validation work

Empirical approach

Data

Empirics sketch

LASSO regression

What about causal links?

Results

Sentiment regressed on context

Filing period excess return

Filing period abnormal volume

Post-filing return volatility

Future material weakness

Falsification test

Construct validity of sentiment

Other sentiment measures

Simulating aggregation: Negative sentiment

Simulating aggregation: Positive sentiment

Conclusion

Wrap-up

Thanks!

Richard M. CrowleySingapore Management Universityhttps://rmc.link/@prof_rmc

Packages used for these slides

Other tables

Table 1

Table 2, Panels A and B

Table 2, Panels C and D

Table 3

What contexts are high in both sentiments?

What contexts skew towards negative sentiment?

What contexts skew towards positive sentiment?

What contexts are low in both sentiments?

Full Table 5

NOTES

Dr. Richard M. Crowley
rcrowley@smu.edu.sg
Slides: https://rmc.link/Rotman \(\cdot\) @prof_rmc

Dr. Richard M. Crowley
rcrowley@smu.edu.sg
Slides: https://rmc.link/Rotman \(\cdot\) @prof_rmc

Richard M. Crowley
Singapore Management University
https://rmc.link/
@prof_rmc