Melting Drugs Data: Part One

foundry

Every year the Home Office, which is responsible for drugs policy, carries out an anonymous survey into use of illegal drugs in England and Wales.

It always throws up some interesting data. Regionally people in London and South West England are the most likely in the country to own up to taking illegal drugs. Obviously your lifestyle is also a major factor. Younger people who go out four times a month are more likely to be drug users (the illegal kind anyway) than stay-at-home bookworms.

I cleaned up the regional data a bit and you can download it to follow along here.

Here is the structure and head of the data:

> str(data)
'data.frame': 84 obs. of 17 variables:
 $ Drug : chr "Amphetamines" "Amphetamines" "Amphetamines" "Amphetamines" ...
 $ Region : chr "England and Wales" "England" " North East " " North West " ...
 $ X2001.02: num 1.5 1.5 1.3 2.1 1.4 1.2 1.7 0.8 1.6 1.9 ...
 $ X2002.03: num 1.6 1.5 2.1 1.6 1.5 1.4 1.2 1.2 1.9 1.5 ...
 $ X2003.04: num 1.5 1.5 2.4 1.5 2.1 1.4 1.4 1.1 1.3 1.4 ...
 $ X2004.05: num 1.4 1.3 1.4 1.6 1.7 1.3 0.8 0.9 1.2 1.4 ...
 $ X2005.06: num 1.4 1.3 2.4 1.2 1.6 1.5 1.3 1.2 1.3 0.9 ...
 $ X2006.07: num 1.3 1.3 2.2 1.5 1 1.4 1.1 1.4 1.1 0.8 ...
 $ X2007.08: num 1 1 1.2 1.1 1 1 0.7 0.8 0.7 0.9 ...
 $ X2008.09: num 1.2 1.2 2 1.4 1.7 0.9 1.5 0.7 0.5 1.2 ...
 $ X2009.10: num 0.9 0.9 1.7 1.7 0.7 0.8 0.7 0.9 0.5 0.8 ...
 $ X2010.11: num 1 1 1.9 1.4 1.3 1 0.7 0.9 0.7 0.8 ...
 $ X2011.12: num 0.8 0.7 1.6 0.6 0.9 0.8 0.5 0.7 0.4 0.7 ...
 $ X2012.13: num 0.6 0.6 0.9 0.6 0.8 0.7 0.3 0.3 0.5 1 ...
 $ X2013.14: num 0.8 0.7 1 0.8 0.6 0.6 0.5 1.1 0.7 0.7 ...
 $ X2014.15: num 0.6 0.6 0.6 0.8 0.9 0.5 0.2 0.4 0.5 0.5 ...
 $ X2015.16: num 0.6 0.6 0.8 0.5 0.7 0.4 0.4 0.3 0.7 0.7 ...

 Drug Region X2001.02 X2002.03 X2003.04
1 Amphetamines England and Wales 1.5 1.6 1.5
2 Amphetamines England 1.5 1.5 1.5
3 Amphetamines North East 1.3 2.1 2.4
4 Amphetamines North West 2.1 1.6 1.5
5 Amphetamines Yorkshire and the Humber 1.4 1.5 2.1
6 Amphetamines East Midlands 1.2 1.4 1.4
 X2004.05 X2005.06 X2006.07 X2007.08 X2008.09 X2009.10 X2010.11 X2011.12
1 1.4 1.4 1.3 1.0 1.2 0.9 1.0 0.8
2 1.3 1.3 1.3 1.0 1.2 0.9 1.0 0.7
3 1.4 2.4 2.2 1.2 2.0 1.7 1.9 1.6
4 1.6 1.2 1.5 1.1 1.4 1.7 1.4 0.6
5 1.7 1.6 1.0 1.0 1.7 0.7 1.3 0.9
6 1.3 1.5 1.4 1.0 0.9 0.8 1.0 0.8
 X2012.13 X2013.14 X2014.15 X2015.16
1 0.6 0.8 0.6 0.6
2 0.6 0.7 0.6 0.6
3 0.9 1.0 0.6 0.8
4 0.6 0.8 0.8 0.5
5 0.8 0.6 0.9 0.7
6 0.7 0.6 0.5 0.4

This data presents a challenge.

The years are in different columns. If you remember from the last post on unemployment, our data was in a vertical time series.

In other words, the dates were stacked vertically in one column rather than across several different columns.

Remember our basic geom_line command, which goes like this:

ggplot([our_data], aes(x = [what goes on x axis,
y = [what goes on y axis])) + geom_line()

We want our y axis to be a single variable for this to work (if there are alternatives, please let me know in the comments).

This is where reshape2 comes in.

Reshape2 is a package designed by Hadley Wickham. Sean Anderson has a detailed overview here.

It functions like a pivot table in a spreadsheet – you melt down data and then (if you want) cast it the way you want.

Melting our data is what we want to do here.

#make sure you have reshape2 installed in RStudio
library(reshape2)

mdata<- melt(data, id = c("Drug","Region"))

Here we are saying: ‘Keep “Drug” and “Region” separate and merge the other columns’.

This is the result:

> str(mdata)
'data.frame': 1260 obs. of 4 variables:
 $ Drug : chr "Amphetamines" "Amphetamines" "Amphetamines" "Amphetamines" ...
 $ Region : chr "England and Wales" "England" " North East " " North West " ...
 $ variable: Factor w/ 15 levels "X2001.02","X2002.03",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ value : num 1.5 1.5 1.3 2.1 1.4 1.2 1.7 0.8 1.6 1.9 ...

Excellent! Our years are now in one column, named variable.

Here is the head of our data:

 Drug Region variable value
1 Amphetamines England and Wales X2001.02 1.5
2 Amphetamines England X2001.02 1.5
3 Amphetamines North East X2001.02 1.3
4 Amphetamines North West X2001.02 2.1
5 Amphetamines Yorkshire and the Humber X2001.02 1.4
6 Amphetamines East Midlands X2001.02 1.2

Our years are there in the variable column.

While we are here, we can clean the data a bit more thoroughly.

We have two types of drugs that aren’t very interesting: ‘Any drug’ and ‘Any Class A drug’. Let’s remove them.

#remove unwanted drug types
kept_drugs <- grep("[^Any Class A drug|Any drug]",mdata$Drug)
mdata <- mdata[kept_drugs, ]

This regular expression uses the | (OR) operator combined with ^ to mean except (i.e. everything except this OR that).

We now have our data in the correct format!

Next up in Part Two we will take a look at the drug habits of different regions.

This data presents a challenge.

This is where reshape2 comes in.

This is the result:

We now have our data in the correct format!

Share this:

Related