Like other departments the Ministry of Justice publishes spending data worth more than £25,000. It does this both for itself and for its executive agencies like the National Offender Management Service (NOMS) which runs the prison system.
It does this monthly. A typical spreadsheet looks like this in R:
dec <- read.csv("noms-spend-december-2016.csv") str(dec) 'data.frame': 14006 obs. of 9 variables: $ Department.Family : Factor w/ 2 levels "","MOJ": 2 2 2 2 2 2 2 2 2 2 ... $ Entity : Factor w/ 2 levels "","NOMS": 2 2 2 2 2 2 2 2 2 2 ... $ Payment.Date : Factor w/ 23 levels "","01/12/2016",..: 21 21 16 18 11 22 11 11 19 3 ... $ Expense.Type : Factor w/ 84 levels "","Ad-Hoc Major Maintenance and Capital Works - more than £10k",..: 35 35 38 19 19 19 19 19 82 2 ... $ Expense.Area : Factor w/ 203 levels ""," Northumberland",..: 57 57 31 141 17 3 3 143 110 57 ... $ Supplier : Factor w/ 227 levels "","ACTES TRUST",..: 38 38 49 25 10 62 62 137 8 7 ... $ System.Transaction.Number: Factor w/ 724 levels "","119","12882010",..: 573 574 460 151 382 572 362 363 34 180 ... $ Total.Transparency.Amount: num -184605 -184605 -149355 -128730 -101171 ... $ Plain.Eng : Factor w/ 66 levels ""," IT maintenance & support",..: 35 35 21 44 44 44 44 44 66 36 ...
Helpfully each transaction has a unique ID; this will be useful later.
Buried in this data are some substantial sums going on compensation.
For instance, there was a sum paid out of £25,600 for compensation to a prisoner or prisoners at HMP Northumberland.
It doesn’t say what happened to merit this payout. I want to find out. That will be the focus of my Freedom of Information request.
I don’t just want to do this for December. I want to ask about the entire year 2016.
So I downloaded all the files and wrote this code:
jan <- read.csv("noms-jan-2016.csv", stringsAsFactors = FALSE) feb <- read.csv("noms-feb16.csv", stringsAsFactors = FALSE) mar <- read.csv("noms-spend-march-2016.csv", stringsAsFactors = FALSE) apr <- read.csv("noms.csv", stringsAsFactors = FALSE) may <- read.csv("noms-may16.csv", stringsAsFactors = FALSE) jun <- read.csv("noms-jun16.csv", stringsAsFactors = FALSE) jul <- read.csv("noms-jul-2016.csv", stringsAsFactors = FALSE) aug <- read.csv("noms-aug-2016.csv", stringsAsFactors = FALSE) sep <- read.csv("noms-sept-2016 (1).csv", stringsAsFactors = FALSE) oct <- read.csv("noms-spend-october- 2016 (2).csv", stringsAsFactors = FALSE) nov <- read.csv("noms-spend-nov-2016 (1).csv", stringsAsFactors = FALSE) dec <- read.csv("noms-spend-december- 2016.csv", stringsAsFactors = FALSE)
full <- rbind(jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec)
The rbind function only works if there are an equal number of columns AND the column names are the same. Unfortunately the MoJ spreadsheets have different column names from month to month.
I started with jan and kept using rbind until I ran into trouble. Then I’d go into that month and change the column name to its January equivalent.
For instance in April “System.Transaction.Number” should just be “Transaction.Number”.
We can use names to clear this up:
names(apr)[7] <- "Transaction.Number" #Names here is taking the seventh column in apr and changing it to our string
Once we’re ready we can use rbind.
Now we can look for compensation using grep:
comp <- grep("compensation", full$Description) comp <- full[comp, ]
From here, I printed off a table containing just the compensation rows:
write.table(comp, "compensation.csv")
I could then note down the Transaction IDs of the ones I thought interesting and send a FOI identifying them specifically to the MoJ.
They haven’t got back to me yet. Let’s hope they give me some useful data!