Sketch

Overview of Twitter data
ML for categorizing tweets by content
- Application: Labeling ESG and financial information
ML for comparing tweet content
- Application: Comparing CEO and CFO tweets to their company’s tweets
ML for user characteristics
- Application: Emotion surrounding COVID-19 in the US

What does Twitter provide?

The text of the message

Enrichments through pictures, video, polls, links, hashtags, cashtags, and user mentions
Language categorization
Is it a quote or retweet?
Number of likes
Number of retweets

User information

A unique ID + their chosen username
Profile picture
Self reported description and URL
Self-reported user location
Verification status
Followers
Following

How can you find tweets?

Twitter has a well developed API
Can download:
- The most recent 3200 tweets of a user
- Any tweets for a search term over the past 7 days
  - A word, e.g. “Bitcoin”
  - A phrase, e.g., “to the moon”
  - A hashtag, e.g., “#HODL”
  - A cashtag, e.g., “$DOGE”
With an academic account:
- Any tweets for a search term since Twitter started, capped at 10M/month
  - Very useful to build historical time series

Raw data vs what you see

{
  "truncated": false,
  "text": "GS reports 2014 net rev of $34.53bn, net earnings of $8.48bn, &amp; 11.2% ROE; 4Q net rev of $7.69bn, net earnings of $2.17bn and 11.1% ROE",
  "is_quote_status": false,
  "id": 556067748196139008,
  "favorite_count": 11,
  "source": "<a href=\"http://www.spredfast.com\" rel=\"nofollow\">Spredfast app</a>",
  "retweeted": false,
  "entities": {
    "symbols": [],
    "user_mentions": [],
    "hashtags": [],
    "urls": []
  },
  "retweet_count": 31,
  "id_str": "556067748196139008",
  "favorited": false,
  "user": {
    "follow_request_sent": false,
    "has_extended_profile": false,
    "profile_use_background_image": true,
    "default_profile_image": false,
    "id": 253167239,
    "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/378800000018727168/b1841e59295b5a69abc238a705e3b030.jpeg",
    "verified": true,
    "translator_type": "none",
    "profile_text_color": "333333",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/465954359583322112/mvHVOgH8_normal.jpeg",
    "profile_sidebar_fill_color": "DDEEF6",
    "entities": {
      "url": {
        "urls": [
          {
            "url": "http://t.co/IORXQSIgTV",
            "indices": [
              0,
              22
            ],
            "expanded_url": "http://www.goldmansachs.com",
            "display_url": "goldmansachs.com"
          }
        ]
      },
      "description": {
        "urls": []
      }
    },
    "followers_count": 604559,
    "profile_sidebar_border_color": "FFFFFF",
    "id_str": "253167239",
    "profile_background_color": "7399C6",
    "listed_count": 4986,
    "is_translation_enabled": false,
    "utc_offset": -14400,
    "statuses_count": 7527,
    "description": "Official Goldman Sachs Twitter account. Follow us for the latest in global and local economic progress, firm news, and thought leadership content.",
    "friends_count": 104,
    "location": "",
    "profile_link_color": "7399C6",
    "profile_image_url": "http://pbs.twimg.com/profile_images/465954359583322112/mvHVOgH8_normal.jpeg",
    "following": false,
    "geo_enabled": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/253167239/1440625536",
    "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/378800000018727168/b1841e59295b5a69abc238a705e3b030.jpeg",
    "screen_name": "GoldmanSachs",
    "lang": "en",
    "profile_background_tile": false,
    "favourites_count": 5,
    "name": "Goldman Sachs",
    "notifications": false,
    "url": "http://t.co/IORXQSIgTV",
    "created_at": "Wed Feb 16 17:59:09 +0000 2011",
    "contributors_enabled": false,
    "time_zone": "Eastern Time (US & Canada)",
    "protected": false,
    "default_profile": false,
    "is_translator": false
  },
  "lang": "en",
  "created_at": "Fri Jan 16 12:37:37 +0000 2015"
}

Categorizing tweets by content

How can we classify general text?

Latent Dirichlet Allocation (LDA)
One of the most popular methods under the field of topic modeling
LDA is a Bayesian method of assessing the content of a document
LDA assumes there are a set of topics in each document, and that this set follows a Dirichlet prior for each document
- Words within topics also have a Dirichlet prior

More details from the creator

How does it work?

Reads all the documents
- Calculates counts of each word within the document, tied to a specific ID used across all documents
Uses variation in words within and across documents to infer topics
- By using a Gibbs sampler to simulate the underlying distributions
  - An MCMC method

Unsupervised method: It will figure out topics by itself

It boils down to a system where generating a document follows a couple rules:
1. Topics in a document follow a multinomial/categorical distribution
2. Words in a topic follow a multinomial/categorical distribution
Use words’ covariance within and across documents to back out topics in a Bayesian manner

Caveat: Need to specify the number of topics ex ante

What about for Twitter posts?

Twitter-LDA

Leverages assumptions about word co-occurence (as in LDA)
Adds in user information to examine word usage across users

Context: Corporate tweets

Collection of all Tweets since 2011 by select firms
- Based on inclusion in the S&P 1500 between 1 January 2012 and 30 September 2016
Monitoring 1,433 firms
- 1,350 have publicly available tweets
- 1,262 have tweeted about CSR
- 997 have sufficient data
  - Non-financial companies in MSI ESG STATS, Compustat, CRSP (datasets)

The question

Question: Are firms using Twitter to greenwash?

Disseminating actual CSR activities

Greenwashing

Implementation: Is the # of CSR tweets negatively associated with firms’ CSR scores?

Machine learning classification

Classify using Twitter-LDA
- See reference papers for usage
Identify 100 topics
- Find 2 topics related to CSR
  - Sustainability and natural resources
  - Community service
- Find 1 topic related to financial information
- 97 other topics
  - Much of it related to marketing and customer service

Topics

Number	Topic	Top_words
27	CSR	water, gas, energy, oil, ceo, industry, food, today, world, global, video, read, #monsanto, technology, #energy, #sustainability, production, solutions, great, booth
40	CSR	support, proud, employees, community, today, great, day, team, food, helping, school, work, kids, local, donate, volunteers, program, join, learn, event
47	Financial	trading, markets, cboe, energy, growth, global, week, options, vix, volatility, stocks, report, economic, update, futures, analysis, investors, rate, fed, today
9	Customer Support	team, contact, hear, issue, dm, support, issues, working, assistance, assist
22	Healthcare	health, care, learn, patients, data, #healthit, healthcare, #healthcare, clinical
25	Stock markets	bell, #nasdaq, opening, ring, closing, #nyse, today, nyse, sale, rings
51	Analytics	data, customer, business, #bigdata, digital, learn, #digital, experience, #analytics, blog
100	Energy	energy, power, learn, home, save, gas, solar, customers, electric, check

Results

\[ CSRtweets = \alpha + \beta_1 Lag(CSR) + \gamma Controls + \varepsilon \]

Expect:
- $\beta_1 > 0$ if good corporate citizens
- $\beta_1 < 0$ if greenwashing

Results support the greenwashing story

Comparing tweet content

Measuring content similarity

Difficulty: Tweets are short, so word choice isn’t a reliable measure

Solution

Universal sentence encoder (USE, Cer et al. 2018)
- Determines meaning of text based on all words in the text
- A measure of meaning, not word choice
- Neural network based (Deep Averaging Network)

How does USE work?

Input: words and bigrams mapped to embeddings
Processing: Averaging + a 4-layer neural network (called a Deep Averaging Network)
Output: A sentence embedding of 512 dimensions

USE abstracts away from word choice!

Context: Executives versus firms

The question

Question: Why do executive tweets impact stock prices?

Trust

Investors trust CEOs more than firms on social media (Elliott et al. 2018)

New information

Market may react only to new disclosure content

Implementation: Does the market responds more strongly to executives’ tweets with content that is more similar to their firms’ tweets?

Testing the mechanism

If an executive’s tweets have the same content as their firm’s prior tweets, any reaction to the tweet…
- Should not be due to new information
- Should be due to trust of the information coming from the CEO

We construct a measure of content similarity to address this

\[ \scriptsize \begin{align*} &\left|MM\ CAR_{(+1)}\right|\\ &\quad= \alpha + \beta_1 Exec\ tweet_{t,e} + \beta_2 Exec\ tweet_{t,e} \times Similarity_{t,f,e}\\ &\quad+ \beta_3 Firm\ tweet_{t,f} + \beta_4 Firm\ tweet_{t,f} \times Similarity_{t,f,e}\\ &\quad+ \Gamma \cdot Controls_{t,f,e} + FE + \varepsilon_{t,f,e} \end{align*} \]

A positive coefficient on $\beta_2$ would support the trust story

Stock reaction mechanism

Main effect of executive tweets is subsumed
Effect comes from executive tweets that are similar to firm tweets
This effect seems to encourage reaction to firm tweets as well

Consistent with effect coming from trust; inconsistent with an information story

Robust to other definitions of the similarity measure

Why trust?

More followers $\Rightarrow$ CEO may be psychologically closer ✔
More personal tweets $\Rightarrow$ psychologically closer ✔
Institutional investors less affected by trust 🗴

VARIABLES	↓Followers	↑Followers	↓Personal	↑Personal	↓Inst	↑Inst
Exec fin tweets	0.007	-0.015	-0.012	-0.018**	0.017	-0.020*
	(0.43)	(-1.36)	(-0.49)	(-2.05)	(0.86)	(-1.96)
Exec second sim x Exec fin tweets	-0.016	0.040*	0.031	0.045**	-0.037	0.049**
	(-0.43)	(1.74)	(0.058)	(2.54)	(-0.95)	(2.31)
Firm fin tweets	-0.005	-0.005**	-0.005*	-0.006***	-0.001	-0.009***
	(-1.45)	(-2.49)	(-1.79)	(-2.80)	(-0.43)	(-2.74)
Exec second sim x Firm fin tweets	0.006*	0.005***	0.006**	0.006***	0.001	0.011***
	(1.73)	(2.65)	(2.00)	(2.87)	(0.49)	(3.13)

Measuring Emotion

Done using Twitter Emotion Recognition from Colneric and Demsar (2020)

Timing of the data

Data by language

Language usage in the US

Fear across the US, 2020 Mar to Oct

To access the video, click here

May 27: Coronavirus deaths in the U.S. passed 100,000.
Oct 2: US President tests positive for COVID-19

Singapore

Thanks!

Richard M. Crowley
Singapore Management University
https://rmc.link/
@prof_rmc

References

Twitter-LDA:
- Zhao, Wayne Xin, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan, and Xiaoming Li. “Comparing twitter and traditional media using topic models.” In European conference on information retrieval, pp. 338-349. Springer, Berlin, Heidelberg, 2011.
- Crowley, Richard, Wenli Huang, and Hai Lu. 2020 (1). “Discretionary Dissemination on Twitter.” SSRN Scholarly Paper ID 3105847. Rochester, NY: Social Science Research Network. https://papers.ssrn.com/abstract=3105847.
- Crowley, Richard, Wenli Huang, and Hai Lu. 2020 (2). “Executive Tweets.” Working Paper
- Crowley, Richard, Wenli Huang, Hai Lu, and Wei Luo. 2019. “Do Firms Manage Their CSR Reputation? Evidence from Twitter.” Working paper, Singapore Management University.
USE:
- Cer, Daniel, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant et al. “Universal sentence encoder.” arXiv preprint arXiv:1803.11175 (2018).
- Crowley, Richard, Wenli Huang, and Hai Lu. 2020 (2). “Executive Tweets.” Working Paper
Twitter Emotion Recognition:
- Colnerič, Niko, and Janez Demšar. “Emotion recognition on twitter: Comparative study and training a unison model.” IEEE transactions on affective computing 11, no. 3 (2018): 433-446.

Packages used for these slides

ggplot
kableExtra
knitr
revealjs

Sketch

Twitter Data

What does Twitter provide?

How can you find tweets?

Raw data vs what you see

Categorizing tweets by content

How can we classify general text?

How does it work?

What about for Twitter posts?

Context: Corporate tweets

The question

Machine learning classification

Topics

Results

Comparing tweet content

Measuring content similarity

How does USE work?

Context: Executives versus firms

The question

Testing the mechanism

Stock reaction mechanism

Why trust?

Measuring emotion of populations

Measuring Emotion

Timing of the data

Data by language

Language usage in the US

Fear across the US, 2020 Mar to Oct

Singapore

Wrap-up

What we covered

Thanks!

Richard M. Crowley
Singapore Management University
https://rmc.link/
@prof_rmc

References

Packages used for these slides

Leveraging Social Media for Social Science Applications

May 2021

Dr. Richard M. Crowleyrcrowley@smu.edu.sghttps://rmc.link/ \(\cdot\) @prof_rmcSlides: rmc.link/PGR

Sketch

Twitter Data

What does Twitter provide?

How can you find tweets?

Raw data vs what you see

Categorizing tweets by content

How can we classify general text?

How does it work?

What about for Twitter posts?

Context: Corporate tweets

The question

Machine learning classification

Topics

Results

Comparing tweet content

Measuring content similarity

How does USE work?

Context: Executives versus firms

The question

Testing the mechanism

Stock reaction mechanism

Why trust?

Measuring emotion of populations

Measuring Emotion

Timing of the data

Data by language

Language usage in the US

Fear across the US, 2020 Mar to Oct

Singapore

Wrap-up

What we covered

Thanks!

Richard M. CrowleySingapore Management Universityhttps://rmc.link/@prof_rmc

References

Packages used for these slides

Dr. Richard M. Crowley
rcrowley@smu.edu.sg
https://rmc.link/ \(\cdot\) @prof_rmc
Slides: rmc.link/PGR

Richard M. Crowley
Singapore Management University
https://rmc.link/
@prof_rmc