Five useful spatial functions from the sf R package

Author’s note: if you like this post, you might like Landmark Atlas. This is my new blog where I tell stories about Britain’s history, culture and geography through maps.

During my mapping project I have made heavy use of the sf package by Edzer Pebesma.

The package contains intuitive, straightforward functions for working with geospatial data. In this post I’m going to outline five useful functions from the wide array available for anyone interested in working with spatial data in R.

You can find the code on GitHub to practise these functions.

1. `read_sf`()

The first function you need to know about is read_sf(). This allows you to read shapefiles into R. Shapefiles contain the spatial data in a geometry dimension that you need to plot maps.

Here is a simple example of how to read a shapefile, using British parliamentary constituencies.

Firstly, go to the ONS Geography portal and download the zipped shapefile folder for Westminster constituencies.

It contains the actual shapefile (.shp) as well as other associated files, which are also needed for R to read it correctly.

Extract all files from the folder and put it in a folder in your working directory called constituencies.

Then run:

library(sf) constituencies <- read_sf('~/constituencies')

This will give you a data frame of 632 observations.

The ~/ indicates to R that this shapefile is within your current working directory. If it’s not, put the full path to the constituencies folder inside read_sf().

On Windows you can find the full path by right-clicking on a folder and clicking Properties and then copying the Location. You will have to change the backslashes to forward slashes, however. This will look on Windows something like C:/Users/You/R/constituencies.

If you call str() on the data frame you get the following:

str(constituencies)
tibble [632 x 10] (S3: sf/tbl_df/tbl/data.frame)
$ objectid : int [1:632] 1 2 3 4 5 6 7 8 9 10 …
$ pcon16cd : chr [1:632] "E14000530" "E14000531" "E14000532" "E14000533" …
$ pcon16nm : chr [1:632] "Aldershot" "Aldridge-Brownhills" "Altrincham and Sale West" "Amber Valley" …
$ bng_e : chr [1:632] "484884" "404723" "374132" "440478" …
$ bng_n : int [1:632] 155126 302568 389051 349675 115542 355939 138824 399819 199541 233685 …
$ long : num [1:632] -0.784 -1.932 -2.39 -1.398 -0.426 …
$ lat : num [1:632] 51.3 52.6 53.4 53 50.9 …
$ st_areasha: num [1:632] 5.30e+07 4.40e+07 5.09e+07 1.25e+08 6.45e+08 …
$ st_lengths: num [1:632] 40827 38222 46098 62106 326799 …
$ geometry :sfc_MULTIPOLYGON of length 632; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:243, 1:2] 485407 485466 485540 485546 485603 …
..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
attr(*, "sf_column")= chr "geometry"
attr(, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA ..- attr(, "names")= chr [1:9] "objectid" "pcon16cd" "pcon16nm" "bng_e" …

The geometry column contains the coordinates that plot the outlines of the constituencies.

2. `st_transform()`

This function is used to transform geospatial data between different coordinate reference systems (CRSs). A CRS is used to turn Earth’s spherical surface into a flat two-dimensional map. There are various different ways to do this represented in different CRSs.

If you’re working with shapefiles from different sources you might find one uses one CRS and the other uses another. Trying to work with the two with different CRSs is no good – so use st_transform() to bring them into a consistent format.

You can use st_crs() to find out the EPSG code of a spatial data frame. Each CRS will have an EPSG code:

st_crs(constituencies)
Coordinate Reference System:
User input: OSGB 1936 / British National Grid
wkt:
PROJCRS["OSGB 1936 / British National Grid",
BASEGEOGCRS["OSGB 1936",
DATUM["OSGB 1936",
ELLIPSOID["Airy 1830",6377563.396,299.3249646,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4277]],
CONVERSION["British National Grid",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",49,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-2,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996012717,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",400000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",-100000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["unknown"],
AREA["UK - Britain and UKCS 49Â°46'N to 61Â°01'N, 7Â°33'W to 3Â°33'E"],
BBOX[49.75,-9.2,61.14,2.88]],
ID["EPSG",27700]]

We can see that the EPSG code is 27700 for this data frame, as it uses the 1936 Ordnance Survey reference system. We can change it to another common CRS, the WGS 84, using the following:

constituencies_wgs <- st_transform(constituencies, crs = 4326)

I’ve often had frustrating error messages crop up when I come to plot maps only to find that the solution is to transform one of my data frames into the right CRS.

3. `st_point(), st_multilinestring(), st_cast()`

The constituencies data frame that we have contains polygons of each parliamentary constituency in Great Britain. We may wish to create some of our own polygons from time to time.

We start with st_point(). This is used to create one-dimensional coordinate points. They need to have an x and y coordinate, like so:

pt1 <- st_point(c(-0.76, 51.28))

That is a point with a latitude of 51.28 and a longitude of -0.76. That places it within Aldershot, the first polygon in our constituencies data frame.

Let’s create a few more:

pt2 <- st_point(c(-0.76, 51.3))
pt3 <- st_point(c(-0.74, 51.3))
pt4 <- st_point(c(-0.74, 51.28))

Now let’s plot these points, using the geom_sf() function from ggplot2.

ggplot() + geom_sf(data = pt1) + geom_sf(data = pt2) + geom_sf(data = pt3) + geom_sf(data = pt4)

We have four points independent of each other in a square. We can join them up as a multilinestring:

mls <- st_multilinestring(list(rbind(pt1,pt2,pt3,pt4,pt1)))

We have to put pt1 in twice to join it up back to where we started at point 1. This cuts down our call to plot it:

ggplot() + geom_sf(data = mls)

We now have a nice square. But it is still treated as a linestring, so we won’t be able to fill it in or calculate other functions like its area. To unlock these we need to turn it into a polygon:

polygon <- st_cast(mls, to = 'POLYGON') ggplot() + geom_sf(data = polygon, fill = 'yellow')

Success! Now we have our polygon.

You will usually want to import other shapefiles for your spatial work. But it can be very useful to know how to draw your own if you wish to use them to annotate other shapefiles.

Here’s how our yellow polygon looks in relation to Aldershot:

#add crs polygon <- sf::st_sfc(polygon, crs = 4326) ggplot() + geom_sf(data = constituencies_wgs[1,]) + geom_sf(data = polygon, fill = 'yellow')

4. `st_intersects()`

This is one of a number of functions that can be used to see whether two polygons occupy the same space.

constituencies_wgs$intersects <- as.character(st_intersects(constituencies_wgs, polygon))

This line of code adds a new column to our constituencies data frame, asking whether our yellow polygon intersects with each of the 632 constituencies of Great Britain. The first constituency, Aldershot, returned ‘1’. The others were all integer(0), indicating they do not touch the yellow polygon at all.

This function can be very handy for narrowing down a large shapefile to a particular area. You can draw a polygon to act as a bounding box as shown above in number three and then use st_intersects() to pinpoint just the part of the shapefile within that box. After that you can remove the superfluous parts of the shapefile from your data frame.

An important constraint in geospatial mapping is process time. Polygons can take a long time to load, work with and visualise so any time-saving devices are most welcome.

5. `st_union()`

The final function in this post takes more than one polygon and combines them together. This one removes the internal boundaries between them if they are next to each other. Another related function st_combine() keeps the internal boundaries.

union <- st_union(constituencies_wgs[1,],constituencies_wgs[315,]) ggplot() + geom_sf(data = union)

combination <- rbind(constituencies_wgs[1,], constituencies_wgs[315,]) combination <- st_combine(combination) ggplot() + geom_sf(data = combination)

I hope you find these functions useful in your R mapping.

Five useful spatial functions from the sf R package

1. `read_sf`()

2. `st_transform()`

3. `st_point(), st_multilinestring(), st_cast()`

4. `st_intersects()`

5. `st_union()`

Related

One thought on “Five useful spatial functions from the sf R package”

1. read_sf()

2. st_transform()

3. st_point(), st_multilinestring(), st_cast()

4. st_intersects()

5. st_union()

Share this:

Related

One thought on “Five useful spatial functions from the sf R package”

1. `read_sf`()

2. `st_transform()`

3. `st_point(), st_multilinestring(), st_cast()`

4. `st_intersects()`

5. `st_union()`