Skip to content

R for Journalists

Unlock the power of R

  • What Is R?
  • R for Rob
  • GitHub
  • Twitter
  • Etsy
  • Home
  • 2017
  • August
  • 17
  • Internal Migration Part III: Plotting Age Groups

Internal Migration Part III: Plotting Age Groups

Posted on August 17, 2017August 18, 2017 By Rob
See

Introduction

In Part I we looked at overall internal migration local authority by local authority – are more people coming than going?

In Part II we looked at where people are moving from and to around the country.

Here in the final Part III we will look at the ages of people moving in and out.

Back up to speed

Here is the code to get you started:

options(scipen = 999)
 pt1 <- read.csv("Detailed_Estimates_2016_Dataset_1.csv")
 pt2 <- read.csv("Detailed_Estimates_2016_Dataset_2.csv")

data <- rbind(pt1, pt2)

library(tidyr)
 library(dplyr)

first <- data %>%
 group_by(OutLA, InLA) %>%
 summarise (sum = sum(Moves))

LA_totals_in <- data %>%
 group_by(InLA) %>%
 summarise (sum = sum(Moves))

LA_totals_out <- data %>%
 group_by(OutLA) %>%
 summarise (sum = sum(Moves))

LA_totals <- merge(LA_totals_in, LA_totals_out, by.x = "InLA", by.y = "OutLA")
LA_totals$net <- LA_totals$sum.x - LA_totals$sum.y

We move on by grouping by age and by local authority:

out_age <- data %>%
group_by(OutLA, Age) %>%
summarise (sum = sum(Moves))

in_age <- data %>%
group_by(InLA, Age) %>%
summarise (sum = sum(Moves))

Just like last time, we need to use all = TRUE in our merge to keep any cases where someone moved one way but no one moved back the other way.

age_totals <- merge(in_age, out_age, by.x = c("Age","InLA"), by.y = c("Age","OutLA"), all = TRUE)

This is another way of replacing the NAs with zero:

age_totals[, 3:4][is.na(age_totals[, 3:4])] <- 0

#replace column names and add net figure

names(age_totals) <- c("Age","LA","In","Out")
 age_totals$net <- age_totals$In - age_totals$Out

So far so good

We have an in/out and net figure for people of each year and each local authority in England and Wales.

This is good, but people tend to think in age groups rather than single ages.

We can develop a formula to categorise these ages:

age_groups <- function(x,y,a,b) {
 if (x >= a && x <= b) {

if (is.na(y)) {
   y <- paste(a,"-",b)
     }
   } else {
     y <- y
   }
 }

Our formula is a bit more complicated than the previous one in Part II. It takes four arguments: x, y and a,b.

X and y will be our columns while a and b will be our ages.

With the first if statement we are saying: “If x is greater than a and if x is less than or equal to y, do…”

So far so good, we can paste something into y.

But there’s a problem. If we left it at that formula will overwrite the old data each time you rerun it. It would return null values for everything that didn’t fall between the parameters.

We get round this by adding the is.na(y). This is saying: “Only do the following if Y = NA.”

If that’s the case, paste (i.e. concatenate) the two parameters with a dash (e.g. 0-17). Otherwise, just leave y as it is (y=y) so we can keep the old values.

Then we use our old friend mapply to run it:

age_totals$groups <- NA

age_totals$groups <- mapply(FUN = age_groups, a = 0, b = 17, x = age_totals$Age, y = age_totals$groups)
age_totals$groups <- mapply(FUN = age_groups, a = 18, b = 34, x = age_totals$Age, y = age_totals$groups)
age_totals$groups <- mapply(FUN = age_groups, a = 35, b = 49, x = age_totals$Age, y = age_totals$groups)
age_totals$groups <- mapply(FUN = age_groups, a = 50, b = 64, x = age_totals$Age, y = age_totals$groups)
age_totals$groups <- mapply(FUN = age_groups, a = 65, b = 112, x = age_totals$Age, y = age_totals$groups)

Finally, we create a new, summarised data frame with the age groups:

age_groups_totals <- age_totals %>%
group_by(LA, groups) %>%
summarise (sum = sum(net))

Bringing in ggplot

We are going to take our ggplot2 game up a bit. Previously we have just used the standard font available in R.

Let’s change that to a new one using the extrafont package.

library(extrafont)
library(extrafontdb)
#warning, this took a while to load
loadfonts()
 fonts()

library(ggplot2)
#add in labels to get round the awkward "65 - 112"
labels <- c("0-17","18-34","35-49","50-64","65+")

Let’s take Birmingham as an example

#isolate Birmingham
birmingham_groups <- age_groups_totals[age_groups_totals$LA == "E08000025", ]

#plot chart
birmingham_groups <- age_groups_totals[age_groups_totals$LA == "E08000025", ]
birmingham_plot <- ggplot(birmingham_groups, aes(x = groups, y = sum)) + geom_bar(stat="identity", fill = "#D4AF37") + scale_x_discrete(labels = labels) + labs(title = "Net number of people moving to Birmingham per year", subtitle = "(Only counts people moving to and from other parts of the UK)", x = "Age groups", y = "Totals") + theme(plot.background = element_rect(fill = "#fdfdf8"), text=element_text(family="Browallia New", color = "#2f2f2d"))
birmingham_plot <- birmingham_plot + expand_limits(y = c(-2000, 1000))

ggsave("birminghamGGSave.png",birmingham_plot, width = 4, height = 3)

I discovered ggsave() thanks to this post from Max Woolf. It really simplifies the plotting process. Previously I was endlessly formatting the plots and then using the snipping tool to take screenshots of them.

This accomplishes the same goal in much less code, automatically printing a chart as a PNG in your working directory and doing all the resizing for you.

We can see that for Birmingham, everyone is leaving except young people, who are arriving.

But if you look at it by year you can see that it looks like it’s just students coming to Birmingham. Once people go past graduation age they begin to depart:

The overall pattern for different age groups is broadly similar for many other cities. Young people (students) arriving and everyone else leaving:

For rural areas, it’s the reverse. Families and older people are moving there while young people are leaving, presumably in search of jobs and excitement elsewhere. Here is the data for rural Shropshire:

And for rural County Durham:

And for rural County Durham:

Students appear to form the bulk of internal migration within Britain.

The ONS recognises this, so much so that they publish data on term-time and out-of-term population.

But it does seem that families and older people are leaving Britain’s major cities – and, importantly, people in their late 20s.

Rural areas on the other hand have mostly the opposite challenge – a large influx of people moving in while the young move out.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook

Related

Tags: internal migration journalism rstats

Post navigation

❮ Previous Post: Internal Migration, Part II: Homing in on individual authorities
Next Post: Calculating Distances in R: How Fast is Your Train? ❯

Recent Posts

  • I’ve moved my blog over to Substack
  • How to plot a large rural area using Ordnance Survey data in R
  • Check the COVID-19 vaccination progress in your area
  • Let R tell you what to watch on Netflix
  • Sentiment analysis of Nineteen-Eighty-Four: how gloomy is George Orwell’s dystopian novel?

Archives

  • April 2022
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • February 2020
  • December 2019
  • November 2019
  • October 2019
  • April 2018
  • March 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016

Categories

  • Geospatial data
  • Landmark Atlas
  • Learn
  • See
  • Seen Elsewhere
  • Site
  • Uncategorized

Copyright © 2025 R for Journalists.

Theme: Oceanly by ScriptsTown

 

Loading Comments...