Five More Common SQL Commands and Their R Equivalents

The only property to sell for £1m+ was in Cobham, Surrey. Taken by Colin Smith, licensed under CC

In the last post we looked at five common SQL commands and their R equivalents using 300 rows of Land Registry data.

To recap, we covered:

Find the average price of sales
Find the most expensive sales
Look for towns containing ‘Bat’ in their name
Find the average price by type of property
Change a field

Here are five more simple tasks in SQL with the R equivalents:

Task 1: Filter by multiple values

SQL command:

SELECT * FROM TABLE 1 WHERE local_authority IN ("STOCKPORT", "MANCHESTER")

R command:

lr <- read.csv("lr.csv",header=FALSE)

colNames <- c("id", "price", "date", "postcode", "type", "new_build", "hold", "house_name2", "house_name1", "street", "town", "town2", "local_authority", "county", "PPD_category_type", "record_status")
 names(lr) <- colNames

LAs <- c("MANCHESTER","STOCKPORT")
 manc_stockport <- subset(lr, local_authority %in% LAs)

Task 2: Make a count of sales by street

SQL command:

SELECT COUNT(street) AS 'Street', street
 FROM TABLE 1 GROUP BY street

R command:

by_street <- lr %>%
 group_by(street) %>%
 summarise(Count = n())

Task 3: Select distinct (unique) values in a dataset

SQL command:

SELECT DISTINCT type FROM TABLE 1

R command:

> unique(lr$type)
 [1] F T D S
 Levels: D F S T

Similar: perform a count of unique values

SQL command:

SELECT COUNT(DISTINCT type) FROM TABLE 1

R command:

length(unique(lr$type))

Task 4: Change blanks to NULL/NA values

In R there is a difference between a blank cell (“”) and a NA.

In SQL it’s the same with NULL values and blank cells.

Having NA values in R means you can use functions like is.na() or arguments such as na.rm. It also looks neater and avoids any hidden whitespace.

SQL command:

First, let’s note the difference in the number of rows. Many fields have a blank house name (“”) rather than NULL values

UPDATE TABLE 1 SET house_name=NULL WHERE house_name=""

After performing this command:

R command:

lr$house_name1 <- as.character(lr$house_name1)
lr$house_name1[lr$house_name1 == ""] <- NA

Task 5: Remove two unneeded columns

We don’t really need the two columns furthest to the right (15 and 16) in our data, so let’s get rid of them. I didn’t name them in SQL so they are known as COL 15 and COL 16.

SQL command:

ALTER TABLE TABLE 1 DROP COL 15, DROP COL 16

R command:

lr <- lr[,1:14]

Conclusion:

Those were five more easy SQL commands and their R equivalents. In the future I intend to look at JOINs in SQL and how they relate to merge() in R.

Here are five more simple tasks in SQL with the R equivalents:

Task 1: Filter by multiple values

SQL command:

R command:

Task 2: Make a count of sales by street

SQL command:

R command:

Task 3: Select distinct (unique) values in a dataset

SQL command:

R command:

SQL command:

R command:

Task 4: Change blanks to NULL/NA values

SQL command:

R command:

Task 5: Remove two unneeded columns

SQL command:

R command:

Conclusion:

Share this:

Related