Leveraging Social Media for Social Science Applications


May 2021


Dr. Richard M. Crowley

https://rmc.link/ \(\cdot\) @prof_rmc
Slides: rmc.link/PGR

Sketch

  1. Overview of Twitter data
  2. ML for categorizing tweets by content
    • Application: Labeling ESG and financial information
  3. ML for comparing tweet content
    • Application: Comparing CEO and CFO tweets to their company’s tweets
  4. ML for user characteristics
    • Application: Emotion surrounding COVID-19 in the US

Twitter Data

What does Twitter provide?

The text of the message

  • Enrichments through pictures, video, polls, links, hashtags, cashtags, and user mentions
  • Language categorization
  • Is it a quote or retweet?
  • Number of likes
  • Number of retweets

User information

  • A unique ID + their chosen username
  • Profile picture
  • Self reported description and URL
  • Self-reported user location
  • Verification status
  • Followers
  • Following

How can you find tweets?

  • Twitter has a well developed API
  • Can download:
    • The most recent 3200 tweets of a user
    • Any tweets for a search term over the past 7 days
      • A word, e.g. “Bitcoin”
      • A phrase, e.g., “to the moon”
      • A hashtag, e.g., “#HODL”
      • A cashtag, e.g., “$DOGE”
  • With an academic account:
    • Any tweets for a search term since Twitter started, capped at 10M/month
      • Very useful to build historical time series

Raw data vs what you see

{
  "truncated": false,
  "text": "GS reports 2014 net rev of $34.53bn, net earnings of $8.48bn, & 11.2% ROE; 4Q net rev of $7.69bn, net earnings of $2.17bn and 11.1% ROE",
  "is_quote_status": false,
  "id": 556067748196139008,
  "favorite_count": 11,
  "source": "<a href=\"http://www.spredfast.com\" rel=\"nofollow\">Spredfast app</a>",
  "retweeted": false,
  "entities": {
    "symbols": [],
    "user_mentions": [],
    "hashtags": [],
    "urls": []
  },
  "retweet_count": 31,
  "id_str": "556067748196139008",
  "favorited": false,
  "user": {
    "follow_request_sent": false,
    "has_extended_profile": false,
    "profile_use_background_image": true,
    "default_profile_image": false,
    "id": 253167239,
    "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/378800000018727168/b1841e59295b5a69abc238a705e3b030.jpeg",
    "verified": true,
    "translator_type": "none",
    "profile_text_color": "333333",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/465954359583322112/mvHVOgH8_normal.jpeg",
    "profile_sidebar_fill_color": "DDEEF6",
    "entities": {
      "url": {
        "urls": [
          {
            "url": "http://t.co/IORXQSIgTV",
            "indices": [
              0,
              22
            ],
            "expanded_url": "http://www.goldmansachs.com",
            "display_url": "goldmansachs.com"
          }
        ]
      },
      "description": {
        "urls": []
      }
    },
    "followers_count": 604559,
    "profile_sidebar_border_color": "FFFFFF",
    "id_str": "253167239",
    "profile_background_color": "7399C6",
    "listed_count": 4986,
    "is_translation_enabled": false,
    "utc_offset": -14400,
    "statuses_count": 7527,
    "description": "Official Goldman Sachs Twitter account. Follow us for the latest in global and local economic progress, firm news, and thought leadership content.",
    "friends_count": 104,
    "location": "",
    "profile_link_color": "7399C6",
    "profile_image_url": "http://pbs.twimg.com/profile_images/465954359583322112/mvHVOgH8_normal.jpeg",
    "following": false,
    "geo_enabled": false,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/253167239/1440625536",
    "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/378800000018727168/b1841e59295b5a69abc238a705e3b030.jpeg",
    "screen_name": "GoldmanSachs",
    "lang": "en",
    "profile_background_tile": false,
    "favourites_count": 5,
    "name": "Goldman Sachs",
    "notifications": false,
    "url": "http://t.co/IORXQSIgTV",
    "created_at": "Wed Feb 16 17:59:09 +0000 2011",
    "contributors_enabled": false,
    "time_zone": "Eastern Time (US & Canada)",
    "protected": false,
    "default_profile": false,
    "is_translator": false
  },
  "lang": "en",
  "created_at": "Fri Jan 16 12:37:37 +0000 2015"
}

Categorizing tweets by content

How can we classify general text?

  • Latent Dirichlet Allocation (LDA)
  • One of the most popular methods under the field of topic modeling
  • LDA is a Bayesian method of assessing the content of a document
  • LDA assumes there are a set of topics in each document, and that this set follows a Dirichlet prior for each document
    • Words within topics also have a Dirichlet prior

More details from the creator

How does it work?

  1. Reads all the documents
    • Calculates counts of each word within the document, tied to a specific ID used across all documents
  2. Uses variation in words within and across documents to infer topics
    • By using a Gibbs sampler to simulate the underlying distributions
      • An MCMC method

Unsupervised method: It will figure out topics by itself

  • It boils down to a system where generating a document follows a couple rules:
    1. Topics in a document follow a multinomial/categorical distribution
    2. Words in a topic follow a multinomial/categorical distribution
  • Use words’ covariance within and across documents to back out topics in a Bayesian manner

Caveat: Need to specify the number of topics ex ante

What about for Twitter posts?

Twitter-LDA

  • Leverages assumptions about word co-occurence (as in LDA)
  • Adds in user information to examine word usage across users

Context: Corporate tweets

  • Collection of all Tweets since 2011 by select firms
    • Based on inclusion in the S&P 1500 between 1 January 2012 and 30 September 2016
  • Monitoring 1,433 firms
    • 1,350 have publicly available tweets
    • 1,262 have tweeted about CSR
    • 997 have sufficient data
      • Non-financial companies in MSI ESG STATS, Compustat, CRSP (datasets)

The question

Question: Are firms using Twitter to greenwash?

Disseminating actual CSR activities

Greenwashing                         

Implementation: Is the # of CSR tweets negatively associated with firms’ CSR scores?

Machine learning classification

  • Classify using Twitter-LDA
    • See reference papers for usage
  • Identify 100 topics
    • Find 2 topics related to CSR
      • Sustainability and natural resources
      • Community service
    • Find 1 topic related to financial information
    • 97 other topics
      • Much of it related to marketing and customer service

Topics

Number Topic Top_words
27 CSR water, gas, energy, oil, ceo, industry, food, today, world, global, video, read, #monsanto, technology, #energy, #sustainability, production, solutions, great, booth
40 CSR support, proud, employees, community, today, great, day, team, food, helping, school, work, kids, local, donate, volunteers, program, join, learn, event
47 Financial trading, markets, cboe, energy, growth, global, week, options, vix, volatility, stocks, report, economic, update, futures, analysis, investors, rate, fed, today
9 Customer Support team, contact, hear, issue, dm, support, issues, working, assistance, assist
22 Healthcare health, care, learn, patients, data, #healthit, healthcare, #healthcare, clinical
25 Stock markets bell, #nasdaq, opening, ring, closing, #nyse, today, nyse, sale, rings
51 Analytics data, customer, business, #bigdata, digital, learn, #digital, experience, #analytics, blog
100 Energy energy, power, learn, home, save, gas, solar, customers, electric, check

Results

\[ CSRtweets = \alpha + \beta_1 Lag(CSR) + \gamma Controls + \varepsilon \]

  • Expect:
    • \(\beta_1 > 0\) if good corporate citizens
    • \(\beta_1 < 0\) if greenwashing


  • Results support the greenwashing story

Comparing tweet content

Measuring content similarity

Difficulty: Tweets are short, so word choice isn’t a reliable measure

Solution

  • Universal sentence encoder (USE, Cer et al. 2018)
    • Determines meaning of text based on all words in the text
    • A measure of meaning, not word choice
    • Neural network based (Deep Averaging Network)

How does USE work?

  • Input: words and bigrams mapped to embeddings
  • Processing: Averaging + a 4-layer neural network (called a Deep Averaging Network)
  • Output: A sentence embedding of 512 dimensions

USE abstracts away from word choice!

Context: Executives versus firms

The question

Question: Why do executive tweets impact stock prices?

Trust

  • Investors trust CEOs more than firms on social media (Elliott et al. 2018)

New information

  • Market may react only to new disclosure content

Implementation: Does the market responds more strongly to executives’ tweets with content that is more similar to their firms’ tweets?

Testing the mechanism

  • If an executive’s tweets have the same content as their firm’s prior tweets, any reaction to the tweet…
    • Should not be due to new information
    • Should be due to trust of the information coming from the CEO

We construct a measure of content similarity to address this

\[ \scriptsize \begin{align*} &\left|MM\ CAR_{(+1)}\right|\\ &\quad= \alpha + \beta_1 Exec\ tweet_{t,e} + \beta_2 Exec\ tweet_{t,e} \times Similarity_{t,f,e}\\ &\quad+ \beta_3 Firm\ tweet_{t,f} + \beta_4 Firm\ tweet_{t,f} \times Similarity_{t,f,e}\\ &\quad+ \Gamma \cdot Controls_{t,f,e} + FE + \varepsilon_{t,f,e} \end{align*} \]

A positive coefficient on \(\beta_2\) would support the trust story

Stock reaction mechanism

  • Main effect of executive tweets is subsumed
  • Effect comes from executive tweets that are similar to firm tweets
  • This effect seems to encourage reaction to firm tweets as well

Consistent with effect coming from trust; inconsistent with an information story

  • Robust to other definitions of the similarity measure

Why trust?

  • More followers \(\Rightarrow\) CEO may be psychologically closer
  • More personal tweets \(\Rightarrow\) psychologically closer
  • Institutional investors less affected by trust 🗴
VARIABLES ↓Followers ↑Followers ↓Personal ↑Personal ↓Inst ↑Inst
Exec fin tweets 0.007 -0.015 -0.012 -0.018** 0.017 -0.020*
(0.43) (-1.36) (-0.49) (-2.05) (0.86) (-1.96)
Exec second sim x Exec fin tweets -0.016 0.040* 0.031 0.045** -0.037 0.049**
(-0.43) (1.74) (0.058) (2.54) (-0.95) (2.31)
Firm fin tweets -0.005 -0.005** -0.005* -0.006*** -0.001 -0.009***
(-1.45) (-2.49) (-1.79) (-2.80) (-0.43) (-2.74)
Exec second sim x Firm fin tweets 0.006* 0.005*** 0.006** 0.006*** 0.001 0.011***
(1.73) (2.65) (2.00) (2.87) (0.49) (3.13)

Measuring emotion of populations

Measuring Emotion

Done using Twitter Emotion Recognition from Colneric and Demsar (2020)

Timing of the data

Data by language

Language usage in the US

Fear across the US, 2020 Mar to Oct

To access the video, click here

  • May 27: Coronavirus deaths in the U.S. passed 100,000.
  • Oct 2: US President tests positive for COVID-19

Singapore

Wrap-up

What we covered

  • Classifying the content of large amounts of tweets in an automated fashion
  • Comparing the underlying meaning of tweets to determine the extent to which information is stale
  • Leveraging social media to gauge a population’s emotional reaction to an event

Thanks!


Richard M. Crowley
Singapore Management University
https://rmc.link/
@prof_rmc

References

  • Twitter-LDA:
    • Zhao, Wayne Xin, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan, and Xiaoming Li. “Comparing twitter and traditional media using topic models.” In European conference on information retrieval, pp. 338-349. Springer, Berlin, Heidelberg, 2011.
    • Crowley, Richard, Wenli Huang, and Hai Lu. 2020 (1). “Discretionary Dissemination on Twitter.” SSRN Scholarly Paper ID 3105847. Rochester, NY: Social Science Research Network. https://papers.ssrn.com/abstract=3105847.
    • Crowley, Richard, Wenli Huang, and Hai Lu. 2020 (2). “Executive Tweets.” Working Paper
    • Crowley, Richard, Wenli Huang, Hai Lu, and Wei Luo. 2019. “Do Firms Manage Their CSR Reputation? Evidence from Twitter.” Working paper, Singapore Management University.
  • USE:
    • Cer, Daniel, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant et al. “Universal sentence encoder.” arXiv preprint arXiv:1803.11175 (2018).
    • Crowley, Richard, Wenli Huang, and Hai Lu. 2020 (2). “Executive Tweets.” Working Paper
  • Twitter Emotion Recognition:
    • Colnerič, Niko, and Janez Demšar. “Emotion recognition on twitter: Comparative study and training a unison model.” IEEE transactions on affective computing 11, no. 3 (2018): 433-446.

Packages used for these slides

  • ggplot
  • kableExtra
  • knitr
  • revealjs