Skip to content

R for Journalists

Unlock the power of R

  • What Is R?
  • R for Rob
  • GitHub
  • Twitter
  • Etsy
  • Home
  • 2018
  • March
  • 30
  • Five Common SQL Commands and their R equivalents

Five Common SQL Commands and their R equivalents

Posted on March 30, 2018April 4, 2018 By Rob
See
the boltons
The Boltons, London. A terraced house here sold for almost £38m in February © Copyright PAUL FARMER and licensed for reuse under this Creative Commons Licence.

I’ve been taking a look at SQL lately and how it relates to R.

SQL is a programming language to manipulate data in databases. Its syntax is different from R. It resembles standard English with clauses more so than R does.

In this post I’m going to look at some simple SQL commands and their R equivalents.

In order to follow along to this post, you’ll need access to some kind of tool to manage an SQL database, such as PHPMyAdmin. I haven’t found one for free yet (if you know of one let me know).

The dataset we’ll use is from the Land Registry

I downloaded the latest month from the Land Registry’s Price Paid data. I took the first 300 rows as my sample data and saved it. I’m using PHPMyAdmin for the SQL side.

Step 1: upload the data

SQL command: Click “Import” and upload your CSV file. It automatically names the table as ‘Table 1’

R command:

setwd("Your directory")
lr <- read.csv("your_csv.csv", header = FALSE)

Step 2: Name the columns

SQL command: Structure -> click Change next to the column you want to change, then type in your new name in Name

Alternatively, using SQL:

ALTER TABLE TABLE 1 CHANGE COL 2 price VARCHAR(7) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL;

R command:

#headers from here https://www.gov.uk/guidance/about-the-price-paid-data#explanations-of-column-headers-in-the-ppd
colNames <- c("id", "price", "date", "postcode", "type", "new_build", "hold", "house_name2", "house_name1", "street", "town", "town2", "local_authority", "county", "PPD_category_type", "record_status")
names(lr) <- colNames

Task 1: Find the average price of all sales in the data

SQL command:

SELECT AVG(price) FROM TABLE 1

The structure of the command is as follows:

SELECT FUNCTION(relevant_column) FROM your_table

We also use back ticks (“) for any names that have a space between them. WordPress converts the back ticks into TABLE 1.

Slightly more advanced, using AS renames your temporary average column as ‘Average price’:

SELECT AVG(price) AS Average price FROM TABLE 1

R command:

> mean(lr$price)
 [1] 191912.3

Task 2: Find the most expensive sales

SQL command:

SELECT * FROM TABLE 1 ORDER BY price DESC LIMIT 5

Make sure your price column is set to INT (integer). You can do this by going to Structure -> Change -> Type.

The * symbol means all.

The LIMIT 5 command limits the query results to five records. This is useful to speed up queries and ensure you don’t crash.

The default in ORDER BY is ascending so we need to add DESC.

R command:

library(dplyr)
lr <- arrange(lr, desc(price))
lr[1:5,]

Task 3: Look for towns containing ‘Bat’ in their name

SQL command:

SELECT * FROM TABLE 1 WHERE town LIKE 'BAT%'

The LIKE condition serves as the filter. The % sign is a wildcard meaning ‘anything’, which would filter towns like Bath, Batheaston or anything else beginning Bat.

R command:

library(dplyr)
filter(lr, grepl('BAT', town))

Task 4: Find the average price by type of property

SQL command:

SELECT AVG(price) AS `Average price`, type FROM `TABLE 1` GROUP BY type

We are selecting the average price (renaming it using AS Average price) and type. We need type as well otherwise it will just give us the averages with no labels. The GROUP BY condition provides the pivot table type function.

R command:

library(dplyr)
by_type <- lr %>%
group_by(type) %>%
summarise(Average_price = mean(price))

Task 5: Change a field

SQL command:

UPDATE TABLE 1 SET type = "DETACHED" WHERE type = "D"

R command:

lr$type <- gsub("D","DETACHED", lr$type)

Conclusion

In the future I’ll post some more SQL/R crossover examples.

Perhaps it’s the hosting package I have but I found it difficult for the phpMyAdmin to hold even one full month of Land Registry data (about 90,000 records of 16 fields). Is that unusual?

Finally, almost everything I do now in R uses the dplyr package. I now inhabit the tidyverse.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook

Related

Tags: land registry sql

Post navigation

❮ Previous Post: My First Shiny App Part II: The Shiny code itself
Next Post: Five More Common SQL Commands and Their R Equivalents ❯

Recent Posts

  • I’ve moved my blog over to Substack
  • How to plot a large rural area using Ordnance Survey data in R
  • Check the COVID-19 vaccination progress in your area
  • Let R tell you what to watch on Netflix
  • Sentiment analysis of Nineteen-Eighty-Four: how gloomy is George Orwell’s dystopian novel?

Archives

  • April 2022
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • February 2020
  • December 2019
  • November 2019
  • October 2019
  • April 2018
  • March 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016

Categories

  • Geospatial data
  • Landmark Atlas
  • Learn
  • See
  • Seen Elsewhere
  • Site
  • Uncategorized

Copyright © 2025 R for Journalists.

Theme: Oceanly by ScriptsTown

 

Loading Comments...