Introduction
George Orwell’s work came into the public domain in the United Kingdom in 2021.
This means that people in Britain can work with and reproduce the works of one of our country’s most famous writers without worrying too much about copyright. As always though – check the restrictions where you live before starting any work.
I decided to do some textual analysis of Orwell’s classic novel Nineteen Eighty-Four to see what I could find.
On this blog I’ve made a start on sentiment and textual analysis before.
As before, I used Julia Silge’s work on Jane Austen as a model for this piece. If you haven’t already read her textual analysis of Austen I recommend checking it out before you start. I’m also going to make heavy use of the tidytext package that she and others have created.
Spoilers for Nineteen Eighty-Four follow.
Go to my GitHub page for the full code that accompanies this post.
Preparing the text
The source text for Nineteen Eighty-Four can be found here on Project Gutenberg. I removed the introductory notes and the Newspeak appendix before copying the text into a text file.
Load the text file and turn it into a large character vector:
#load source data
txt <- read.delim('path/to/1984.txt', header = FALSE)
#create one huge character vector
cvec <- txt %>% as.character()
You can subdivide Nineteen Eighty-Four into its parts, chapters, sentences and words, in descending order of detail.
I’ve decided to focus on the chapters to break up the book. The text analysis happens at the word level, but we must not forget to link each word with the chapter in which it is found.
Each chapter begins with CHAPTER [X], which means we can use a regular expression to capture the numbers each chapter can possibly be and split the complete character vector up based on that.
Our regular expression is Chapter [0-9]
.
chapters <- cvec %>% strsplit('Chapter [0-9]') %>% as.data.frame()
#tidy up df
names(chapters) <- 'chapter'
chapters$chapter <- gsub('"','',chapters$chapter)
chapters$chapter <- gsub(',','',chapters$chapter)
chapters$chapter <- gsub('c\(','',chapters$chapter)
chapters$chapter <- trimws(chapters$chapter, which = 'both')
Next up we are doing some further tidying of the data. We remove the ‘parts’ from the data frame and reorder the chapter numbers. In the book there are three parts and the chapter numbers restart at each part.
I thought about keeping the parts and using facet_wrap()
to split the plot into parts one, two and three. It does work, but it was messing up my annotations and in the end I concluded keeping the parts in doesn’t really add to my analysis at all, so I removed them:
#remove parts and reorder chapter numbers
chapters$chapter <- gsub(' PART TWO','',chapters$chapter)
chapters$chapter <- gsub(' PART THREE','',chapters$chapter)
chapters <- chapters[-1,] %>% as.data.frame()
names(chapters) <- 'chapter'
chapters$chapter_no <- seq(1,23,1)
At this point we now have a data frame with each chapter as a row. As I said earlier, we now need to split the chapters by word whilst retaining each word’s association with each chapter.
It took me a while to figure out how to do this, but eventually I built this function:
chapter_words <- data.frame()
chapter_and_word <- function(x,y) {
x <- data.frame(text = x, chapter = y) %>% unnest_tokens(word, text) %>% anti_join(stop_words)
chapter_words <<- rbind(chapter_words,x)
}
mapply(chapters$chapter, FUN = chapter_and_word, y = chapters$chapter_no)
This uses tidytext::unnest_tokens()
to split each chapter by word, removing common stop words using anti_join(stop_words)
and holding on to each word’s chapter.
I used the useful technique of the <<-
to save data generated within a function to a data frame outside the apply loop.
We now have a data frame of every word in Nineteen Eighty-Four, minus common stop words such as ‘the’ and ‘of’, with a number representing its chapter in a data frame.
Sentiment analysis
The tidytext package has a function get_sentiments()
where you can get a lexicon of words graded according to sentiment. There are a number to choose from. I have gone with AFINN because it has degrees of positivity and negativity from -5 to +5.
Now we merge our data frame with the AFINN data frame and set all the neutral words to 0 sentiment:
#merge with sentiment
#use afinn
afinn <- get_sentiments('afinn')
#keep position of words in text
chapter_words$position <- row.names(chapter_words) %>% as.numeric()
#merge with sentiment
chapter_words <- merge(chapter_words,afinn, by = 'word', all.x = TRUE)
chapter_words <- arrange(chapter_words, position)
chapter_words$value[is.na(chapter_words$value)] <- 0
Now the last step of this section is to sum each chapter by its total sum. Words with negative sentiment will be weighed against those with positive connotations to get a score for each chapter!
#create table of sentiment by chapter
chapter_sentiment <- chapter_words %>%
group_by(chapter) %>% summarise(sentiment = sum(value))
Plot
#capitalise x and y axes
names(chapter_sentiment) <- c('Chapter','Sentiment')
#plot
ggplot() + geom_line(data = chapter_sentiment, aes(x = Chapter, y = Sentiment), size = 1.2) +
geom_point(data = chapter_sentiment, aes(x = Chapter, y = Sentiment), size = 2.8) +
annotate("text", x = 12, y = 50, label = "Winston finds \nMr Charrington's shop") +
annotate("text", x = 15, y = -320, label = "The book reveals the true \nnature of the world") +
annotate("text", x = 22, y = -300, label = "Winston is tortured\n in the Ministry of Love") +
annotate("segment", x = 15.5, xend = 16.8, y = -350, yend = -350, size=1.3, arrow=arrow()) +
annotate("segment", x = 21, xend = 19.2, y = -330, yend = -330, size=1.3, arrow=arrow()) +
scale_x_continuous(breaks= seq(1,23,1), labels = seq(1,23,1)) +
scale_y_continuous(limits= c(-400,100)) +
labs(title = 'Sentiment in Nineteen Eighty-Four', subtitle = 'Higher scores mean more positive chapters of the novel', caption = 'Source: Project Gutenberg') +
theme_minimal()
The plot is a fairly basic line and point plot at its base. I used several ggplot annotations to highlight particular high or low points in the book, similar to how Julia Silge annotated Jane Austen’s novels.
Analysis
Nineteen Eighty-Four is not renowned for being a cheery read and so it proves in this sentiment analysis. Almost every chapter has a net negative score, meaning that the words with negative connotations outweigh those with positive ones.
I thought the chapters where Winston is tortured in the Ministry of Love would be the most depressing, but actually the chapter where Winston finally gets the chance to read the book-within-a-book, The Theory and Practice of Oligarchical Collectivism that is the least positive.
In this chapter the true horror of the world Winston and Julia live in is revealed to them. The world is divided into three superstates: Oceania, Eurasia and Eastasia. These vast states engage in a war designed to produce an endless stalemate to keep their respective populations in poverty and in a perpetual state of fear and hostility to the enemy.
I found it fitting that this chapter was the most depressing one. The true nature of Oceania and the world is told in even grimmer tones than the torture scenes to follow.
This chapter has a parallel with the story of Adam and Eve in the Bible. The serpent convinces Eve to eat from the forbidden Tree of Knowledge. She and Adam both eat, gain self-awareness and are immediately banished from the Garden of Eden by God.
In Nineteen Eighty-Four Winston and Julia read the forbidden book in their seemingly private room above Mr Charrington’s shop. They gain awareness of the true nature of the society they live in, and are immediately arrested by the Thought Police. They realise, too late, that they have been tricked by Charrington and O’Brien (the snakes) into reading the book.
The cockney accent had disappeared; Winston suddenly realized whose voice it was that he had heard a few moments ago on the telescreen. Mr Charrington was still wearing his old velvet jacket, but his hair, which had been almost white, had turned black. Also he was not wearing his spectacles. He gave Winston a single sharp glance, as though verifying his identity, and then paid no more attention to him. He was still recognizable, but he was not the same person any longer. His body had straightened, and seemed to have grown bigger. His face had undergone only tiny changes that had nevertheless worked a complete transformation. The black eyebrows were less bushy, the wrinkles were gone, the whole lines of the face seemed to have altered; even the nose seemed shorter. It was the alert, cold face of a man of about five-and-thirty. It occurred to Winston that for the first time in his life he was looking, with knowledge, at a member of the Thought Police.
Conclusion
That was a sentiment analysis of the chapters of Nineteen Eighty-Four.
For further work you could look at the frequency of words like doublethink that Orwell coined in the book or analyse the relationships between the characters.
But it was all right, everything was all right, the struggle was finished. He had won the victory over himself. He loved Big Brother.