Skip to content

R for Journalists

Unlock the power of R

  • What Is R?
  • R for Rob
  • GitHub
  • Twitter
  • Etsy
  • Home
  • 2017
  • November
  • 11
  • The Losses in the Final Year of WW1

The Losses in the Final Year of WW1

Posted on November 11, 2017 By Rob
See

Back in August 2014, around the 100th anniversary of the outbreak of the First World War, the Data Unit published our analysis of the Commonwealth War Graves Commission‘s records of fallen soldiers, airmen, sailors and other servicemen and women who gave their lives during the next four years.

As the 100th anniversaries have come and gone, we have run separate analyses of Christmas 1914, Loos, Jutland, the Somme and Passchendaele.

Today is Armistice Day 2017, which means that in one year’s time we will reach the 100th anniversary of the end of the war.

But many more young men would be killed before peace was finally declared.

From Tableau to R

When I first started analysing the World War I data in 2014 I used Tableau.

Tableau is a good tool for analysing large datasets. It can handle large volumes of data and doesn’t have a steep learning curve to get started with it, unlike R.

Now however, I use R to analyse the data.

library(dplyr)
#load the CSV - note the 'header=FALSE' to indicate it has no header row
war <- read.csv("casualties_ww1_SUPER_NEW.csv", header = FALSE)

#remove the unnecessary timestamp
war$V8 <- gsub(" 00:00:00.000", "", war$V8)
lastyear <- war[war$V8 >= "1917-11-11", ]
lastyear <- lastyear[lastyear$V8 <= "1918-11-11",]

To make the stories easier to understand, I selected only those who died between November 11, 1917 and November 11, 1918.

Of course, the declaration of peace did not heal the wounds of those already injured, and more soldiers went on to die after November 11, 1918.

The next step is to add in the ranks spreadsheet to match the numbered rank codes to their ranks.

ranks <- read.csv("Ranks.csv")
lastyear_ranks <- merge(lastyear, ranks, by.x = "V3", by.y = "rank", all.x = TRUE)

Most of the biographical details are stored as free text in one field.

To find the names of soldiers from a particular area, we have to search in this field.

For Bath, for example:

bath <- dplyr::filter(lastyear_ranks, grepl("Bath",V11))

This filter does a ‘contains’ search, picking up all cases where this word

The problem with this kind of search is that includes all (case-sensitive) mentions of the word Bath.

That will pick up Bath Road, Bath Street, Bath Cottages, plus any word containing the word Bath such as the Somerset village of Bathealton.

The only way to clean the data is to go through it as thoroughly as I can removing any cases where these crop up.

You can either do this in R or you can print off the spreadsheet and go through it in OpenOffice. I prefer doing this step in OpenOffice.

Here is the finished result, as published by the Bath Chronicle.

The sacrifices of those young men 100 years ago have always stuck with me since I began looking at this data three years ago.

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook

Related

Tags: dplyr WW1

Post navigation

❮ Previous Post: Scraping in R: Access to mortgage petition
Next Post: Vandalism Causing Train Delays ❯

Recent Posts

  • I’ve moved my blog over to Substack
  • How to plot a large rural area using Ordnance Survey data in R
  • Check the COVID-19 vaccination progress in your area
  • Let R tell you what to watch on Netflix
  • Sentiment analysis of Nineteen-Eighty-Four: how gloomy is George Orwell’s dystopian novel?

Archives

  • April 2022
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • February 2020
  • December 2019
  • November 2019
  • October 2019
  • April 2018
  • March 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016

Categories

  • Geospatial data
  • Landmark Atlas
  • Learn
  • See
  • Seen Elsewhere
  • Site
  • Uncategorized

Copyright © 2026 R for Journalists.

Theme: Oceanly by ScriptsTown