What is R?
R is a free programming language that allows you to analyse and visualise large datasets.
If you use an environment like RStudio you can type in commands and R will perform them for you, like this:
5 + 5  10
R uses vectors, lists, matrices and data frames but most of R for Journalists will focus on data frames.
A data frame is the closest thing in R to a standard spreadsheet with rows and columns. An example data frame looks like this:
LA.code Local.Authority Quarter Year Sum.of.Value.Msk 1 1000 Not Identified Q4 2003 750 2 1000 Not Identified Q3 2003 886 3 1000 Not Identified Q2 2003 858 4 1000 Not Identified Q1 2003 1305 5 1000 Not Identified Q4 2004 548 6 1000 Not Identified Q3 2004 629
With data in this format you can query it, filter it, change it, add to it and visualise it in much the same way you can with a spreadsheet, only much more powerfully.
R comes as a base package but programmers around the world have created free add-ons, called packages, that do additional functions and visualisations.
R comes with its own notation which I’ll explain in a different post.
Why should I care about R?
If your job involves handling lots of data, whether in journalism or not, you should consider using R for your data analysis.
R has two main strengths:
Handling large datasets
It has no problem handling datasets that are millions of rows long and dozens of columns wide. Any dataset this big would likely be very tough to do anything useful with in a spreadsheet.
It can also produce beautiful visualisations, like this:
But I use spreadsheets already. Do I really need R?
Perhaps you don’t. R is best for handling large datasets and visualisation.
It has a learning curve, so if you just look at the odd spreadsheet now and again R probably isn’t worth your time.
My eyes glaze over when looking at spreadsheets.
In that case, R probably isn’t for you. If you want to get started in data journalism there are easier routes in to the field.
Here are some to get started with instead:
Getting Started with Data Journalism by Claire Miller, my colleague at Trinity Mirror
Data Journalism Handbook by Paul Bradshaw, James Ball, both former tutors of mine at City University and others
Interhacktives, a project by students on the Interactive Journalism MA at City University, a course I took
Paul Bradshaw‘s GitHub has a lot of beginner’s resources to get started with
I know a bit about spreadsheets. This R thing sounds interesting!
Excellent, this site is for you.
Stick around for examples of what R can do, tips, walkthroughs and cheat sheets!