Skip to content

R for Journalists

Unlock the power of R

  • What Is R?
  • R for Rob
  • GitHub
  • Twitter
  • Etsy
  • Home
  • 2019
  • December
  • 23
  • How to Perform Sentiment Analysis on Tweets

How to Perform Sentiment Analysis on Tweets

Posted on December 23, 2019December 22, 2020 By Rob
Learn

I did a post on Twitter in November about how to analyse users’ Twitter followers. We looked at whether Arsenal and Tottenham Hotspur footballers, rivals on the pitch, were followers on social media.

Here is another peek at the plot:

arsenal-spurs

What we haven’t done so far is use the Twitter API to analyse tweets sent on the platform. We’ll do that in this post on the festive topic of Christmas. Additionally we’ll delve into sentiment analysis to weigh up whether people are feeling positive about the season or whether they are channelling their inner Scrooge.

Full code:

library(tidyverse)
library(tidytext)
library(rtweet)

#login
create_token(
app = "your_app",
consumer_key = "###",
consumer_secret = "###",
access_token = "###",
access_secret = "###")

#tweet analysis

search_term <- '#ChristmasEveEve'
by <- 'hour'


tweets <- search_tweets(
search_term, n = 10000, retryonratelimit = TRUE
)


tweets$date <- substr(tweets$created_at,1,10)
tweets <- tweets[tweets$date == '2019-12-23',]


rtweet::ts_plot(tweets, by = by, trim = 1) + geom_point() + theme_minimal() + labs(title = paste0('Tweets mentioning "',search_term,'" by ',by),
x = 'Date', y = 'Count', caption = 'Source: Twitter API')


#save
ggsave('tweet_volume.png',last_plot())

#sentiment
sentiment <- tweets[,3:5] %>% unnest_tokens(output = 'word', input = 'text')

#add sentiment dataset
sentiment_dataset <- get_sentiments("afinn")
sentiment_dataset <- arrange(sentiment_dataset, -value)

#merge
sentiment <- merge(sentiment, sentiment_dataset, by = 'word')

#clean
sentiment$word <- NULL
sentiment$screen_name <- NULL

#time
sentiment$hour <- format(round(sentiment$created_at, units="hours"), format="%H:%M")

#pivot
pivot <- sentiment %>%
group_by(hour) %>%
summarise(sentiment = mean(value))

#plot
ggplot(pivot[-1,], aes(x = hour, y = sentiment)) + geom_line(group = 1) + geom_point() + theme_minimal() + labs(title = paste0('Average sentiment of tweetings mentioning "',search_term,'"'),
subtitle = paste0(pivot$hour[2],' - ',pivot$hour[nrow(pivot)],' on ', format(sentiment$created_at[1], '%d %B %Y')),
x = 'Date', y = 'Sentiment', caption = 'Source: Twitter API')
ggsave('tweet_sentiment.png',last_plot(), width = 8, height = 6)

Step 1: Authenticate and log in to the Twitter API

As in the last post that used the Twitter API, you’ll need to create an app and authorise it if you haven’t already. Here is the documentation for how to do this, or you can read my previous post. Remember not to share you keys with anyone, unless you trust that person with your account!

library(tidyverse)
library(tidytext)
library(rtweet)

create_token(
  app = "your_app_name",
  consumer_key = "###",
  consumer_secret = "###",
  access_token = "###",
  access_secret = "###")

Step 2: Gather some tweets

As it’s nearly Christmas, I thought I’d gather some festive tweets for this post. At the time of writing it is 23 December, otherwise known as ‘Christmas Eve Eve’. We’ll choose that.

I set a by variable as hour for the plot that will come later. I’m also filtering out tweets sent before Christmas Eve Eve. Otherwise you get a plot that looks like this:

xmas_eve_eve_v1

Tweets about 23 December shoot up in volume on 23 December! Groundbreaking analysis that is not.

search_term <- '#ChristmasEveEve'
by <- 'hour'

tweets <- search_tweets(
search_term, n = 10000, retryonratelimit = TRUE
)

tweets$date <- substr(tweets$created_at,1,10)
tweets <- tweets[tweets$date == '2019-12-23',]

Step 3: Plot a chart of tweets by volume

The rtweet package that we are using contains a function called ts_plot. It uses ggplot to build a line chart of tweets over time. You can call by as an argument to specify days, hours or minutes, which is very useful for searches with differing volumes of tweets. In this case we have already set by to hour. The trim argument removes the first

rtweet::ts_plot(tweets, by = by, trim = 1) + geom_point() + 
theme_minimal() + labs(title = paste0('Tweets mentioning "',
search_term,'" by ',by),
x = 'Date', y = 'Count', caption = 'Source: Twitter API')


#save
ggsave('tweet_volume.png',last_plot())

tweet_volume.png

That is looking much better.

Step 4: Sentiment analysis

Sentiment analysis is the science of measuring the positivity or negativity of words. The aim is to get an overall sense of the tone of some text by analysing the connotations of the words that make it up. The tidytext package has several dictionaries of words scored by their positive or negative connotations. Comparing each individual word against the lexicon allows us to build up a rough picture of whether the statements are positive or negative in tone.

Here is an example, from the lexicon we’ll be using:

sentiment_dataset <- get_sentiments("afinn")
sentiment_dataset <- arrange(sentiment_dataset, value)
> sentiment_dataset[1:10,]
# A tibble: 10 x 2
word value
<chr> <dbl>
1 breathtaking 5
2 hurrah 5
3 outstanding 5
4 superb 5
5 thrilled 5
6 amazing 4
7 awesome 4
8 brilliant 4
9 ecstatic 4
10 euphoric 4

These positive words are given scores of four or five. Negative words like ‘hatred’, ‘poison’ or ‘terrible’ are given negative scores of up to minus five.

This is just one way to measure sentiment – there are other classifications you can use instead.

The first step is to split the tweets up into individual words:

sentiment <- tweets[,3:5] %>% unnest_tokens(output = 'word', input = 'text')

Step 5: Merge the Twitter data with the sentiment scores

This gives us a data frame of each word in each tweet as a new row. Now we will merge it with the lexicon. It will remove any words that don’t match the database, which will include any hashtags, @ handles and any other missing words.

After a bit more cleaning we are almost ready to plot.

sentiment_dataset <- get_sentiments("afinn")
sentiment_dataset <- arrange(sentiment_dataset, -value)

#merge
sentiment <- merge(sentiment, sentiment_dataset, by = 'word')

#clean
sentiment$word <- NULL
sentiment$screen_name <- NULL

#get nearest hour of time for plot
sentiment$hour <- format(round(sentiment$created_at, units="hours"), format="%H:%M")

Step 6: Pivot and plot

In the previous step we rounded the timestamp of each tweet to the nearest hour. The data frame looked like this:

> str(sentiment)

'data.frame': 16280 obs. of  3 variables:
 $ created_at: POSIXct, format: "2019-12-23 16:30:59" "2019-12-23 15:21:00" "2019-12-23 14:28:16" "2019-12-23 15:14:28" ...
 $ value     : num  -2 2 1 -3 -3 -3 -3 -2 -2 1 ...
 $ hour      : chr  "17:00" "15:00" "14:00" "15:00" ...

Each record has its sentiment value, a timestamp and the nearest hour. We no longer need the word or who originally tweeted it because they are no longer relevant.

The final step before plotting is to summarise the data frame into a table with the mean net sentiment score for each hour. Initially I summed the sentiment values together for each hour but then I found that it tallied closely with the volume of tweets, which made it of little use.

pivot <- sentiment %>%
group_by(hour) %>%
summarise(sentiment = mean(value))

#plot
ggplot(pivot[-1,], aes(x = hour, y = sentiment)) + geom_line(group = 1) + geom_point() + theme_minimal() + labs(title = paste0('Average sentiment of tweetings mentioning "',search_term,'"'),
subtitle = paste0(pivot$hour[2],' - ',pivot$hour[nrow(pivot)],' on ', format(sentiment$created_at[1], '%d %B %Y')),
x = 'Date', y = 'Sentiment', caption = 'Source: Twitter API')

tweet_sentiment

Analysis

We can see that sentiment peaked sharply around 07:00 as people began to realise Christmas was the day after next. The rest of the day it steadily fell away with occasional slight rebounds.

Conclusion

Sentiment analysis is an inexact but useful science. Inevitably the analysis takes words out of context and interprets them at face value. Irony, sarcasm and quotations are beyond the scope of this type of project and are extremely difficult to factor into any machine-led textual analysis.

However, with these caveats in mind you can still derive value from sentiment analysis if you have a rich and large corpus. The best I’ve seen is Julia Silge’s wonderful work on sentiment in Jane Austen’s novels. If nothing else, you can prove, using R, that it really is the most wonderful time of year.

On that note, have a happy Christmas.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook

Related

Tags: sentiment analysis twitter

Post navigation

❮ Previous Post: Machine Learning in R: Using nearest neighbours to work out Premier League footballers’ positions
Next Post: The Loudness War on Spotify ❯

Recent Posts

  • I’ve moved my blog over to Substack
  • How to plot a large rural area using Ordnance Survey data in R
  • Check the COVID-19 vaccination progress in your area
  • Let R tell you what to watch on Netflix
  • Sentiment analysis of Nineteen-Eighty-Four: how gloomy is George Orwell’s dystopian novel?

Archives

  • April 2022
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • February 2020
  • December 2019
  • November 2019
  • October 2019
  • April 2018
  • March 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016

Categories

  • Geospatial data
  • Landmark Atlas
  • Learn
  • See
  • Seen Elsewhere
  • Site
  • Uncategorized

Copyright © 2025 R for Journalists.

Theme: Oceanly by ScriptsTown

 

Loading Comments...