After looking at Manchester’s flat boom, we’re going over t’Pennines to Leeds, or more accurately the LS postcode for Leeds and the surrounding area.
Here is the postcode for Leeds Town Hall:
We are particularly interested in the first half of the postcode, LS1.
The ‘LS’ stands for the postcode area, which is named after the post town, in this case Leeds.
The LS postcode area is broken down into districts, from LS1 in the city centre up to LS29 in Ilkley, plus a few non-geographical ones.
The Land Registry 2016 data file we looked at last time contains the postcode of each property sold this year.
We will do two new things in R in this post:
Split the postcode data to get postcode areas
Plot averages of data rather than the data itself
Loading the data into R:
We do this the same way as before:
#removes scientific notation options(scipen = 999) #read Land Registry CSV houses <- read.csv("pp-2016.csv", header=FALSE,stringsAsFactors = FALSE) #add column names colnames(houses) <- c("id", "price", "date", "postcode", "type", "y/n", "hold", "housename2", "housename1", "street", "neighbourhood1", "neighbourhood2", "la","county","a","a2") str(houses) #remove other no.other <- grep("[^O]", houses$type) filtered_data <- houses[no.other, ]
Splitting the data:
We are going to be splitting the postcode column in our data into the first and second halves.
There are several ways to do this but one of them is using the colsplit function, which uses the replace2 package.
#create a new data frame with the split postcodes split_postcodes <- colsplit(filtered_data$postcode," " ,c("First half", "Second half")) #add the prices to this new data frame using cbind split_postcodes_price <- cbind(split_postcodes, filtered_data$price)
Now it’s time to isolate the Leeds data:
ls_filter <- grep("LS",split_postcodes_price$`First half`) ls <- split_postcodes_price[ls_filter, ] #rename the columns to make them easier to work with colnames(ls) <- (c("first","second","price"))
And now to get the average house prices:
Here we’re going to be using the aggregate function.
ag_ls <- aggregate(price ~ first, ls, mean)
This creates a new data frame of the first halves of the postcode with the average prices. We are aggregating price and the first half of the postcode using the mean.
Here is what happens when we plot that:
This is looking good, except it’s plotting the bars alphabetically. I’d rather they were ordered, so for that we’ll need to order our data.
ag_ls$first <- factor(ag_ls$first, levels = ag_ls$first[order(ag_ls$price)])
We are doing reordering our postcode areas by average price before we plot it. We’ll plot it again, this time showing all the formatting:
ggplot(ag_ls, aes(x = first, y = price, )) + geom_bar(stat="identity", fill="#CC6666") + theme_classic() + ggtitle ("Average house prices in and around Leeds, 2016") + labs(x = "Postcode district", y ="", color = "") + theme(plot.title = element_text(size = 30), legend.title = element_text(size = 18), axis.title.x = element_text(size = 24), axis.text.y = element_text(size = 18), axis.ticks.x = element_blank(), axis.ticks.length = unit(0.5, "cm"), axis.text.x = element_blank(), legend.text = element_text(size = 18)) + scale_y_continuous(labels = dollar_format(prefix = "£")) #add the labels above the bars + geom_text(aes(label=first, fontface="bold"), vjust=-0.5, size=3.5)
I made the labels as big as I could, but the number of bars makes it difficult because they start to overlap if you make them much bigger.
LS11 is the postcode area that covers Beeston – the area of Leeds that became notorious for being the home of two of the four London 7/7 bombers. The average house price in the area has been less than £100,000 between January and July 2016.
On the other hand the LS22 and LS23 postcode areas cover the affluent market town of Wetherby. These are the most expensive parts of the LS postcode – houses here change hands for more than £300,000 on average.
Getting counts of the number of sales
We can use the table function to get a count of the number of sales in each postcode. That way we can see whether the market has been quiet in some postcodes, which might have skewed some of the data.
> table(ls$first) LS1 LS10 LS11 LS12 LS13 LS14 LS15 LS16 LS17 LS18 LS19 LS2 LS20 81 274 251 363 286 243 268 358 391 205 209 150 127 LS21 LS22 LS23 LS24 LS25 LS26 LS27 LS28 LS29 LS3 LS4 LS5 LS6 159 142 81 96 349 250 329 396 375 10 74 58 315 LS7 LS8 LS9 178 321 284
Only one postcode, LS3, has had fewer than 50 house sales. Therefore I’d say we have enough sales spread over our postcodes for our graph to be valid.
There you have it.
How to split data in data frames, use aggregate and more house price insights in the North.