Skip to content

R for Journalists

Unlock the power of R

  • What Is R?
  • R for Rob
  • GitHub
  • Twitter
  • Etsy
  • Home
  • 2018
  • April
  • 10
  • SQL and R: Joins and Merges

SQL and R: Joins and Merges

Posted on April 10, 2018December 22, 2020 By Rob
Uncategorized

Next up in our mini-series on SQL and R are the JOIN family of functions.

They work in the same way as the merge() function in R. To recap the basic syntax of merge():

merged_df <- merge(df1, df2, by.x = "code1", by.y = "code2")

Merge works using a column that is common to both data frames. Usually this will be some kind of ID field where each value is unique. If the common field has the same name in each data frame you can use the by argument, otherwise you’ll need by.x and by.y.

Inner join

An inner join takes only rows with a match in both tables. If your common field is an ID, only records with the same ID in BOTH tables will be included.

In the following examples, I took two data sets from the 2011 Census. In one of them I deleted all but 17 rows of the data.

You will find all 17 of the IDs of the second data set in the first data set because every local authority level Census spreadsheet has all the local authorities in England and Wales in it.

This means that our inner join will be 17 rows long. All 17 records in data set 2 are in data set 1, but only those 17 from data set 1 are in data set 2.

SELECT *
FROM TABLE 2
INNER JOIN TABLE 3 ON TABLE 2.geography code = TABLE 3.geography code

Here is a screenshot of the output:

R equivalent

bulk <- read.csv("bulk.csv")
bulk1 <- read.csv("bulk (1).csv")

bulk_merged <- merge(bulk1, bulk, by = "geography.code")

Left join

A LEFT JOIN takes all the values in the first table plus the records that match those in the second table.

In our example, if there are values in TABLE 2 not in TABLE 3, then they will still be included but there will be NULL values in the right hand side of the join, as seen in the example.

SELECT * 
FROM TABLE 2 
LEFT JOIN TABLE 3 ON TABLE 2.geography code = TABLE 3.geography code

R equivalent

The R equivalent makes use of the all.x argument. Set that to true and it will include all records in the first data set, match or not.

bulk_merged <- merge(bulk, bulk1, by = "geography.code", all.x = TRUE)

Right join

The RIGHT JOIN is the same, except all the values in the second table (TABLE 3 in our example) are kept regardless of a match.

In our example, as all the values in TABLE 3 are in TABLE 2, this one gives the same effect as the INNER JOIN.

SELECT *
FROM TABLE 2 
RIGHT JOIN TABLE 3 ON TABLE 2.geography code = TABLE 3.geography code

R equivalent

The same result can be achieved by using all.y = TRUE.

bulk_merged <- merge(bulk, bulk1, by = "geography.code", all.y = TRUE)

Conclusion

For more information, W3 Schools pages on joins have useful Venn diagrams showing which records will be included in an inner, left and right join.

 

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook

Related

Tags: merge sql

Post navigation

❮ Previous Post: Five More Common SQL Commands and Their R Equivalents
Next Post: How to make a UK Local Authority choropleth map in R ❯

Recent Posts

  • I’ve moved my blog over to Substack
  • How to plot a large rural area using Ordnance Survey data in R
  • Check the COVID-19 vaccination progress in your area
  • Let R tell you what to watch on Netflix
  • Sentiment analysis of Nineteen-Eighty-Four: how gloomy is George Orwell’s dystopian novel?

Archives

  • April 2022
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • February 2020
  • December 2019
  • November 2019
  • October 2019
  • April 2018
  • March 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016

Categories

  • Geospatial data
  • Landmark Atlas
  • Learn
  • See
  • Seen Elsewhere
  • Site
  • Uncategorized

Copyright © 2025 R for Journalists.

Theme: Oceanly by ScriptsTown

 

Loading Comments...