Americans are voting today to choose their 45th president.
Come January, Barack Obama will leave the White House after eight years as President of the United States.
So how will he be remembered?
Historians are already trying to place Obama in the rankings of US presidents.
I took this list of 18 different surveys from Wikipedia.
It shows the rankings of each US president from one (best) to 44(worst).
For the first time properly in R, I plotted my data using a box plot.
[skip to the analysis if you’re not interested in the technical stuff]
What’s a box plot?
A box plot is a way of plotting data to show their ranges.
It’s useful when you want to show variation in your data.
Our dataset is essentially 18 variations on the same theme. For something as subjective as rating all 44 presidents spanning more than 200 years, it’s unlikely that any two historians, journalists or academics are going to award precisely the same marks to all of them.
The opinion of any one group of experts may not be that interesting. But the average and range of all 18 opinions of a president will be. This is what the boxplot shows nicely.
How to plot a box plot in R
I copied and pasted the data into a spreadsheet and removed the asterisks. I ignored the aggregate ranking.
pres <- read.csv("pres.csv")
Calling str on it shows that each survey is in a different column. We need to use gather to get them all in the same column before we can plot our box plot.
pres_gathered <- pres %>% gather(President, ranking, Schl..1948:APSA.2015)
Next up, we need to add the missing values (NAs).
Missing data is represented in the Wikipedia table as a dash (-). R has read this as “â€“”, so we’ll use gsub to remove that and replace it with the standard NA:
pres_gathered$ranking <- gsub("â€“",NA,pres_gathered$ranking)
The last thing to do is to change our ranking variable from a character string to a number. Otherwise the plot won’t work.
pres_gathered$ranking <- as.numeric(pres_gathered$ranking)
We’re ready to plot!
#create breaks breakss <- c(1,5,10,15,20,25,30,35,40,45)
library(ggplot2) #plot boxplot p <- ggplot(pres_gathered, aes(factor(No.),ranking, fill = factor(Political.party))) + geom_boxplot() #change colours and add legend, titles and axes labels p <- p + scale_fill_manual(values=c("#999999", "#00007f", "#999999","#999999", "#990000", "#999999"), guide = guide_legend(title = "Party")) p <- p + ggtitle("Rankings of US presidents") p <- p + labs(x = "Number",y = "Ranking") #insert breaks p <- p + scale_y_continuous(breaks = breakss) #other styling p <- p + theme(axis.title=element_text(size=30), axis.text=element_text(size=17), plot.title=element_text(size=55), legend.key.size = unit(1, "cm"), legend.text = element_text(size = 20), legend.title = element_text(size = 20)) p
The aes (aesthetics) are the variables we are comparing, in this case the number of the president (from George Washington and number one through to Barack Obama at number 44) and their rankings. We are also colouring in political party to help tell the difference between Democrat and Republican presidents.
We’re using breaks to get around the problem of having a scale from zero that we had in the last post.
It makes sense to start from one and go up to 45 in intervals of five, as we’ve had 44 presidents.
It removed the missing values (NAs).
How to read a box plot
The boxes indicate the boundaries of the middle 50 per cent of the data, with the black line in the middle representing the (median) average.
So a longer box means more debate about a president’s merits whereas a shorter box means there’s more consensus about where they stand. A line with no box around it at all would indicate all 18 groups of scholars agreed precisely on a president’s rank.
The lines extending out from the boxes give a bit more detail as to where other outliers lie, but the really far-out ones are represented as dots.
Barack Obama is a middling president so far. We don’t have a lot of data on him because he’s still in office, but those who have offered their opinions so far tend to agree where he ranks.
His predecessor George W. Bush is lower down in the rankings.
The black line representing his median is quite far up in his box indicating that more often than not scholars rate him towards the back of the list.
However, in fairness to him historians also haven’t had long to judge his time in the Oval Office, so we don’t have a lot of data on him.
George Washington, Abraham Lincoln and Franklin D. Roosevelt stand out
They are the ‘big three’ if you like (numbers one, 16 and 32. Perhaps president number 48 will be a visionary!) All the scholars rated them very highly and they stand out from the rest of the pack.
(It’s not easy to tell from the box plot which number corresponds to which president, unfortunately. Plotting it by presidents’ names made it unreadable. Any suggestions on how to improve this welcome.)
That’s two of the four faces on Mount Rushmore, to be joined by:
Teddy Roosevelt and Thomas Jefferson
The third and the 26th presidents are the other two presidents immortalised on Mount Rushmore, and historians seem to agree that their terms in the White House were fruitful ones.
Remember that the longer the box, in general the more disagreement there was among scholars.
They seem to be divided on Ronald Reagan, the 40th president, and on the architect of much of the US Constitution, fourth president James Madison.
Some tipped them for greatness, others were unconvinced.
It’s likely that if you asked ten different Americans for an opinion just of the presidents since World War II you would get a variety of responses, the same as if you went into a pub in Britain and asked whether Margaret Thatcher was a better prime minister than Tony Blair (I wouldn’t necessarily advise doing that).
This is why a box plot comes in handy. It doesn’t show you what any one person thinks, it shows the range of what they all think.
If you believe in the ‘wisdom of the crowds‘, then if you ask enough people, the opinion should coalesce around the average.
In theory then, most informed people should be able to look at the plot and not completely disagree with the results.
Or perhaps everyone else is just wrong and you’re right…