Skip to content

R for Journalists

Unlock the power of R

  • What Is R?
  • R for Rob
  • GitHub
  • Twitter
  • Etsy
  • Home
  • 2016
  • November
  • 5
  • Dodgy Road Signs, Traffic Lights and Markings: Part Two

Dodgy Road Signs, Traffic Lights and Markings: Part Two

Posted on November 5, 2016 By Rob
See

At the end of the last post, we had our data in a file named defective.

The data contains all the injuries caused by car accidents where a road sign, signal or marking was obscured or not working properly from 2010 to 2015 (4,672 in total).

To R or not to R?

We could continue to analyse the data in R but I decided to print it off using write.table and analyse it in Google Sheets instead.

This website is called R for Journalists, but I don’t have to use R just for the sake of it. I don’t think doing this would really have been any faster in R, so I decided not to use it this time.

A few more vlookups needed

If you remember from last time, the DfT uses numbers to replace names and phrases in the data. They publish a lookup sheet that we’ll use to add in the two key data points we are missing:

  1. The name of the local authority in which the accident took place
  2. The situation with the signs or markings (the special conditions at site)

For those of you unfamiliar with how to do vlookups in Google Sheets, this is how we do it:

  • Get the cheat sheet [will download on click]
  • Find and copy the Local Authority District (tab 6)
  • Add this in to our data as a new tab
  • Do the same with Special Conditions at Site
  • Perform the vlookups

The basic formula is this:

=VLOOKUP(M2,'Local Authority District'!A:B,2,FALSE)

This says: ‘take the value in M2 and look for it in Local Authority District. If you find it, give me the value in the second column next to it’.

Calling a pivot table on the data gives us our top ten local authorities with the most accidents where traffic lights, signs or markings were found to be defective or obscured:

top-10

And here are the top 10 by year:

by-year

Glasgow is top with 126 injuries in these kinds of accidents, ahead of Manchester in second with 92 and Liverpool in third with 89.

Oldham had a very good year in 2015 – there wasn’t a single injury from an accident with dodgy road signs or markings recorded in 2015 after 20 such injuries the previous year.

Things got significantly worse, according to the data, in Bradford and Glasgow.

Remember this doesn’t mean the signs or markings caused the accidents; just that they were recorded as faulty by the police.

These accidents are also the ones recorded by police – if you don’t call the cops, it doesn’t go in the data.

This data doesn’t tell the whole story though

You might be wondering, for instance, why London doesn’t appear in the top 10. Surely its huge size would put it number one?

London is not there because the city is split into its 32 boroughs plus the City of London.

If we were to add the totals from all 33 subdivisions of London, it would give us 477 which would indeed put London top.

There are two main caveats with the data:

  1. The data doesn’t account for the size of the road network
  2. It doesn’t account for how much traffic there is along the roads

We could make adjustments for these, but I didn’t in the stories I wrote based on this data.

The most newsworthy fact here is that these kinds of accidents and injuries are happening, not the rate at which they happen.

Sometimes you can try and cram in too much data into a story in order to explain and account for absolutely everything.

The resulting story will then be very ‘data-heavy’ and will likely bore our general interest readers.

In these situations, it’s often enough to step back, remind people of the pitfalls and caveats and let them reach their own conclusions.

Concluding thoughts:

R was very useful for handling the large datasets, merging them together and filtering the specific data we were interested in.

From here, we had a choice about whether to continue with R or whether to use an ordinary spreadsheet program. There wasn’t much difference between the two, but I chose the latter. You don’t have to use R all the time!

 

Picture is from Oast House Archive, reused under Creative Commons

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook

Related

Tags: accidents ddj journalism rstats traffic

Post navigation

❮ Previous Post: Seen Elsewhere: 4th November 2016
Next Post: How Much Money Has My Young Person’s Railcard Saved Me? ❯

Recent Posts

  • I’ve moved my blog over to Substack
  • How to plot a large rural area using Ordnance Survey data in R
  • Check the COVID-19 vaccination progress in your area
  • Let R tell you what to watch on Netflix
  • Sentiment analysis of Nineteen-Eighty-Four: how gloomy is George Orwell’s dystopian novel?

Archives

  • April 2022
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • February 2020
  • December 2019
  • November 2019
  • October 2019
  • April 2018
  • March 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016

Categories

  • Geospatial data
  • Landmark Atlas
  • Learn
  • See
  • Seen Elsewhere
  • Site
  • Uncategorized

Copyright © 2025 R for Journalists.

Theme: Oceanly by ScriptsTown

 

Loading Comments...