Before you begin, this module expects students to have some basic knowledge of R and GIS.
This module will follow a similar structure to previous QGIS session, however, in these lessons, we aim to equip students with skills to manage, analyse and present spatial data using R software.
sp
and sf
packagesterra
packageggplot2
tmap
The sp
package (spatial) provides classes and methods
for spatial (vector) data; the classes document where the spatial
location information resides, for 2D or 3D data. Utility functions are
provided, e.g. for plotting data as maps, spatial selection, as well as
methods for retrieving coordinates, for subsetting, print, summary,
etc.
The sf
package (simple features = points, lines,
polygons and their respective ‘multi’ versions) is the new kid on the
block with further functions to work with simple features, a
standardized way to encode spatial vector data. It binds to the packages
‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical
operations, and to ‘PROJ’ for projection conversions and datum
transformations.
For the time being, it is best to know and use both the sp and the sf
packages, as discussed in this
post. However, we focus on the sf
package. for the
following reasons:
To sp object <- as(object, Class = "Spatial")
To sf object_sf = st_as_sf(object_sp, "sf")
A word of advice: be flexible in the usage of sf and sp. Sometimes it
may be hard to explain why functions work for one data type and do not
for the other. But since transformation is quite easy, time is better
spend on analyzing your data than on wondering why operations do not
work. For the purpose of this material we will focus of teaching you the
package sf
as it is intended to succeed and replace R
packages sp
, rgeos
and the vector parts of
rgdal
packages. It also connects nicely to tidyverse learnt
in previous modules.
To import data we would first need to start with loading the library
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(tidyverse)
Shapefiles can be called in to R using the function
st_read()
. Similarly to read_csv()
we include
a filepath to a shapefile. In this instance we would load the part of
the shapefile that ends with .shp
fakeland <- st_read("shapefiles/FAK_HDs.shp")
## Reading layer `FAK_HDs' from data source
## `/Users/pamratia/Documents/ASTMH_tutorial_webpage/shapefiles/FAK_HDs.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 46 features and 5 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -1.137248 ymin: 7.081189 xmax: 4.912022 ymax: 10.37418
## Geodetic CRS: WGS 84
You’ll notice that when you load the shapefile in, there will be a set of information explaining the features of the shapefile. The first sentence shows that you have loaded a ESRI shapefile, it contains 46 features (which are polygons in this case) and 5 columns of information stored as a data tabel. It mentions also there is a spatial extent (called bounding box) and the coordinate reference system (CRS).
you can also get this information when you simply call the sf object
fakeland
In this case it is important to read in and check the metadata for the type of spatial information you have loaded.
Another key type of spatial data is data linked to coordinates,
frequently found in a table format. We can read these data in using the
read_csv()
function, then we may want to convert these into
spatial information using the st_as_sf()
function. Here we
have a csv file containing the coordinates for the locations of health
facilities in our routine data. Note that we set the projection for the
spatial object using the crs
command; crs 4326 is the
standard WGS 84 CRS.
fakeland_hf_gps <- read_csv("data/fakeland_hf_gps.csv")
## Rows: 100 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Country_ID, Country, Dist_Code, adm1, adm2
## dbl (3): X, Y, hf
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Make the gps data into a point simple feature
fakeland_hf_points <- st_as_sf(fakeland_hf_gps, coords = c("X", "Y"), crs = 4326)
We learnt about the importance of Coordinate Reference Systems (CRS) for projecting spatial data in module 3. Using the sf package it is easy to view your data on different projects and switch between different projections.
We have already seen that you can identify the CRS when reading in a
shapefile using st_read
and when viewing the features of a
shapefile. You can also see the CRS using the command
st_crs()
st_crs(fakeland)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
We can then change the projection using the command
st_transform()
. We can also combine the two commands to
change the projection of one data type to be that of the other.
# Change the projection to UTM zone 17N
fakeland_utm <- st_transform(fakeland, 26717)
ggplot(fakeland_utm)+
geom_sf()+
theme_bw()
# Change the projection of the fakeland_utm shapefile to match that of the original fakeland
fakeland2 <- st_transform(fakeland_utm, st_crs(fakeland))
ggplot(fakeland2)+
geom_sf()+
theme_bw()
We often want to join data to shapefiles to enable the creation of
maps and to analyse spatial data. You can use the join functions shown
in module 2 to join sf data to tables. Just note that the sf data must
go first to automatically be recognised as sf. Else you would need to
reset it using st_as_sf()
annual_data <- read_csv("data/annual_admin_data.csv")
## Rows: 46 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): adm1, adm2
## dbl (18): test, test_u5, test_ov5, test_rdt, test_rdt_u5, test_rdt_ov5, test...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
annual_data_sf <- fakeland %>%
left_join(annual_data, by = c("adm2", "adm1"))
st_write
can be used in the same way as
write_csv
to save shapefiles.
st_write(annual_data_sf, "outputs/annual_admin_routine_data.shp")
Task 1
- Load in the shapefile called “fakeland_task”
- How many polygons are in the shapefile?
- What projection is this shapefile in?
- Change the projection to be in WGS 84
Solution
fakeland <- st_read("shapefiles/fakeland_task.shp") fakeland st_crs(fakeland) fakeland <- st_transform(fakeland, 4326)
Raster data has historically been dealt with in R primarily using the
raster
package, however, this is being discontinued as
replaced by the terra
package. We can start by loading the
package in R.
library(terra)
## terra 1.5.21
##
## Attaching package: 'terra'
## The following object is masked from 'package:dplyr':
##
## src
## The following object is masked from 'package:tidyr':
##
## extract
## The following object is masked from 'package:ggplot2':
##
## arrow
In terra we deal with rasters as a class of data called “SpatRaster”. This is a multi-layered feature which stores the parameters to describe the raster, including then number or rows and columns, the spatial extent and projection.
We can import raster data using the function rast()
, and
look at the raster properties by calling the object.
population <- rast("rasters/ihme_corrected_worldpop_All_Ages_3_2018.tif")
population
## class : SpatRaster
## dimensions : 84, 156, 1 (nrow, ncol, nlyr)
## resolution : 0.04166667, 0.04166667 (x, y)
## extent : -1.5, 5, 7, 10.5 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : ihme_corrected_worldpop_All_Ages_3_2018.tif
## name : ihme_corrected_worldpop_All_Ages_3_2018
## min value : 0.6848159
## max value : 300891.2
We can view the raster simply by using the plot()
function.
plot(population)
We can also use this function to read in a stack of multiple rasters.
In the raster folder you will find a series of rasters with the prefix
“GPMM”, these are the monthly rainfall at a 5km resolution for Fakeland.
We can identify these rasters using the regular expression and read them
in as a raster stack using rast()
.
file.names <- list.files(path = 'rasters/', pattern = '^GPMM.*?\\.tif$', full.names = T)
GPMM <- rast(file.names)
plot(GPMM)
Lets rename the rasters in the stack to make the names more meaningful. We can also use the index number to extract a single layer. We can select one raster in the stack by using the numerical index. So this would give us the monthly rainfall raster for June.
names(GPMM) <- c('GPMM_Jan', 'GPMM_Feb', 'GPMM_Mar', 'GPMM_Apr', 'GPMM_May', 'GPMM_Jun',
'GPMM_Jul', 'GPMM_Aug', 'GPMM_Sep', 'GPMM_Oct', 'GPMM_Nov', 'GPMM_Dec')
plot(GPMM[[6]])
We can quickly summarise the data from rasters in plots such as histograms.
hist(log(population)) #plotting the log of the population due to the scale
Additionally we can also use ggplot
to visualise
rasters, giving us more control over the appearance of the plot. First
you need to convert the raster to a data frame, then it can be plotted
using geom_raster
. We can then use the same functions to
control the appearance of the plot discussed in the data visualisation
module.
pop_df <- as.data.frame(population, xy = TRUE)
ggplot()+
geom_raster(data = pop_df, aes(x = x, y = y, fill = ihme_corrected_worldpop_All_Ages_3_2018 ))+
scale_fill_viridis_c()+
labs(fill = "Population")+
theme_bw()
As with the shapefiles it is important to know and control the
projection of the raster. We can find out what the current projection is
and change it using the crs()
function. This will allow us
to work with rasters and other spatial data.
crs(population)
## [1] "GEOGCRS[\"WGS 84\",\n DATUM[\"World Geodetic System 1984\",\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]]"
crs(population) <- "+proj=utm +zone=48 +datum=WGS84"
We can find out the spatial extent of a raster by using the
ext()
function, and easily crop the raster to other extents
using crop()
. We can specify the coordinates we wish to
crop the raster to, or a take the extent from a spatial object and crop
the raster to that.
ext(population)
## SpatExtent : -1.5, 5, 7, 10.5 (xmin, xmax, ymin, ymax)
pop1 <- crop(population, c(-1,2,9,10)) # c(xmin, xmax, ymin, ymax)
plot(pop1)
# Get the extend of the "West" region of fakeland
e <- filter(fakeland, adm1 == 'West') %>%
ext()
# Crop the raster to the same extent
West_pop <- crop(population, e)
ext(West_pop)
## SpatExtent : -1.125, 1.125, 7.08333333333333, 10.0833333333333 (xmin, xmax, ymin, ymax)
We may then want to change all of the raster cells which lay outside
of the polygon for the West region to be NA. This can be done using
mask()
.
library(terra)
m <- filter(fakeland, adm1 == 'West')
West_pop <- mask(West_pop, vect(m))
plot(West_pop)
We frequently want to extract values from rasters. This could be
extracting all values within a polygon of a shapefile to perform
calculations on (such as summing the population in each district), or
extracting values for specific coordinates. This can be done using the
extract()
function. This can extract data from locations
stored as a spatial vector (SpatVector - points, polygons or lines), a
matrix with coordinates (x,y) or a vector of cell numbers.
Firstly, if we wanted to extract the population values for the first 5 cells in the raster we could use this command:
terra::extract(population, c(1,2,3,4,5))
## ihme_corrected_worldpop_All_Ages_3_2018
## 1 203.1444
## 2 505.9554
## 3 855.8597
## 4 1277.4124
## 5 1942.9839
Next, if we had a specific set of coordinates we wanted to know the
population at, we would define these coordinates in a matrix with the
longitude (x) as the first value, and the latitude (y) as the second
value (N.B They must be in this order) then use the
extract()
function.
coords <- cbind(1, 8)
extract(population, coords) # Extract the population for these coordinates
## ihme_corrected_worldpop_All_Ages_3_2018
## 1 1186.163
If we had a vector of spatial vector of coordinates, such as our
dataset of health facility locations, we can pass this to the
extract()
function to get the raster values for each point.
This function additionally works on raster stacks. If we run this on our
stack of monthly rainfall rasters we can see that the values are
extracted for each set of coordinates from each monthly raster. These
can then be processed as a normal data frame
GPMM_extract <- extract(GPMM, vect(fakeland_hf_points))
head(GPMM_extract)
## ID GPMM_Jan GPMM_Feb GPMM_Mar GPMM_Apr GPMM_May GPMM_Jun GPMM_Jul
## 1 1 0.140751183 51.77404 66.19682 81.09777 147.76413 134.0340 302.64267
## 2 2 0.007372544 50.04529 60.83176 88.40833 129.95383 140.5897 289.12778
## 3 3 0.004494285 42.21128 61.98917 90.62826 133.26007 142.3357 271.26285
## 4 4 1.027425170 54.43633 38.86056 67.79018 111.49038 186.1210 108.48865
## 5 5 0.951637805 48.27084 68.90779 93.04726 120.39143 161.5795 160.36346
## 6 6 0.524218440 36.64803 66.77352 65.90304 97.89902 185.6424 97.19148
## GPMM_Aug GPMM_Sep GPMM_Oct GPMM_Nov GPMM_Dec
## 1 177.9095 229.0716 90.79294 11.788742 0.86349326
## 2 176.3797 225.8389 86.13807 11.834579 0.17535123
## 3 174.8476 237.6628 102.00393 11.467198 0.22886179
## 4 264.4977 249.9303 127.60249 6.701365 0.15531778
## 5 256.3834 232.9854 128.41360 5.979004 0.07403241
## 6 213.7514 269.6235 184.14915 8.359339 0.12935738
Finally, we look at extracting the population values for each admin 2
district of fakeland. For this we provide extract()
with
the raster and the shapefile. The output is then the population value
for each cell of the raster, linked to each polygon ID of the shapefile.
We can the perform summary statistic on this as we would a standard data
frame.
fakeland_pop <- extract(population, vect(fakeland))
head(fakeland)
## Simple feature collection with 6 features and 5 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -1.070175 ymin: 7.614849 xmax: 1.138602 ymax: 10.09391
## Geodetic CRS: WGS 84
## Country_ID Country Dist_Code adm2 adm1 geometry
## 1 FAK Fakeland FAK001 Zila West POLYGON ((0.6603422 8.42418...
## 2 FAK Fakeland FAK002 Youko West POLYGON ((0.1701822 9.12116...
## 3 FAK Fakeland FAK003 Kokam West POLYGON ((-0.3757387 9.7560...
## 4 FAK Fakeland FAK004 Tangue West POLYGON ((-0.7725049 9.0987...
## 5 FAK Fakeland FAK005 Cakure West POLYGON ((0.8481822 9.31230...
## 6 FAK Fakeland FAK006 Yenagbo West POLYGON ((0.3932125 10.0613...
fakeland_pop <- fakeland_pop %>%
group_by(ID) %>%
summarise(population = sum(ihme_corrected_worldpop_All_Ages_3_2018, na.rm = T)) %>%
mutate(adm2 = fakeland$adm2)
We can save a rasters in a variety of formats using
writeRaster
. Here was dave it as a “.tif”
writeRaster(West_pop, "rasters/West_adm1_population.tif")
Task 2
- Using the “fakelands_task” shapefile from task 1, ensure the projection is in “4326” (WGS84)
- Import the raster of rainfall for April “GPMM.IMerg.V06B.MM_Total.2018.04.Data.5km”
- Extract the vaules from the raster based on the polygons in your shapefile
- Calculate the mean rainfall for each polygon in the shapefile
Solution
fakeland <- st_read("shapefiles/fakeland_task.shp") fakeland <- st_transform(fakeland, 4326) GPMM_Apr <- rast("rasters/GPMM.IMerg.V06B.MM_Total.2018.04.Data.5km.tif") names(GPMM_Apr) <- 'GPMM_Apr' fakeland_rain <- extract(GPMM_Apr, fakeland) %>% group_by(ID) %>% summarise(mean_rainfall = mean(GPMM_Apr, na.rm = T)) %>% mutate(adm1 = fakeland$adm1)
In this demo we will join together what we have learnt, importing the shapefile and population raster for Fakeland, calculating the total population. We will join our annually aggregated routine data with the population and calculated the annual parasite incidence then join this on to the shapefile,
rm(list = ls()) # clear the console
# Import the data sources
annual_data <- read_csv('data/annual_admin_data.csv')
fakeland <- st_read('shapefiles/FAK_HDs.shp')
population <- rast('rasters/ihme_corrected_worldpop_All_Ages_3_2018.tif')
# Extract the population for each district
fakeland_pop <-
extract(population, vect(fakeland)) %>%
group_by(ID) %>%
summarise(population = sum(ihme_corrected_worldpop_All_Ages_3_2018, na.rm = T)) %>%
mutate(adm2 = fakeland$adm2)
# Join onto the routine data and calculate incidence
annual_data <-
left_join(annual_data, fakeland_pop) %>%
mutate(API = conf/population*100000) %>%
select(adm1, adm2, API)
#Join the routine data onto the shapefile
fakeland_annual <-
left_join(fakeland, annual_data)
In the demo we have created a dataset with annual incidence at admin 2 level. The next obvious thing to do is have a look at it. In this section we’ll explore some popular ways of making chloropleth maps in R. Although not designed as a full-scale GIS application, R (especially when combined with its vast ecosystem of packages) provides a powerful, endlessly customisable, tool for spatial visualisation.
It’s possible to produce maps using nothing more than
rgdal
. Simply calling plot
does the job, but
plots out all attributes of our shapefile in quite an unappealing way.
You can select out one attribute to plot using the normal R way of
subsetting: sf_object[name_of_attribute]
. This way we can
very quickly plot out the API in Fakeland, as joined to the shapefile
above, by admin2 unit:
library(rgdal)
plot(fakeland_annual["API"])
By default R will place the legend wherever it thinks ‘best’, but you
can specify location with the additional argument key.pos
(1 for below; 2 left; 3 above and 4 to the right).
key.width
and key.length
allow you to control
the dimensions of the legend.
plot(fakeland_annual["API"], key.pos = 1)
Notice that by default the colour ramp is discrete. You can tune the
colour breaks using either or both of breaks
and
nbreaks
. nbreaks
defines the number of breaks,
breaks
is either
pretty
, but other options include
equal
, jenks
and quantile
.Titles can be specified using main
, axes (with
lat-longs) toggled on/off using axes
. We can use a log-10
scale using logz
(if using this careful with your breaks!).
pal
is a palette function, detemining the colour ramp. The
default colour ramp is a bit 1980s for our purposes – let’s try Viridis
instead.
# define the palette
test_pal <- viridis::viridis(10)
plot(fakeland_annual["API"],
key.pos = 1,
pal = test_pal,
breaks = 'quantile',
main = "API, Fakeland 20XX",
axes = F,
logz = T)
The above is a perfectly servicable map. In practice, however, the
flexibility of ggplot2
is worth the install and the
learning curve. You met ggplot2
earlier today – recall that
the package implements ‘the grammar of graphics’ to build plots in
layers. This philosophy is powerful, allowing us a much greater degree
of freedom in our map-making than we could achieve in base R.
Recall from earlier the typical ggplot
call:
ggplot(data, aesthetic)
. When making chloropleth maps, our
data will typically be an sf
object, with a column
corresponding to the variable you wish to map, and a geometry column
describing the geography of the data.
Given an sf
object, you can use geom_sf
to
produce a simple plot of your shapefile without even explicitly defining
any aesthetics.
ggplot(fakeland) +
geom_sf() + # as we've called ggplot using an sf object as the data argument, geom_sf can figure it out itself
coord_sf() # deals with map projection
geom_sf
is a bit different to the ggplot2
geometries you met earlier: it requires a new aesthetic,
geometry
. The good news is you often don’t need to worry
about specifying this, as ggplot
tries to help behind the
scenes:
geometry
argument to your
geom_sf
call, R will try to find a column in your provided
data called ‘geometry’, and use that if it’s suitable;sf
object in the function call
(i.e. geom_sf(data = sf_object,..)
) then ggplot can
automatically detecte the geometry column (it doesn’t even need to be
called geometry!)aes(geometry = geometry_column)
. If you have multiple
geometry columns you’ll need to do this, as R can’t read your mind.Let’s see how we can use ggplot2
to map the API in each
admin2 unit of Fakeland. To show the full range of variation on the map,
we’ll log-transform the colour-scale. Let’s decide on some breaks:
my_breaks <- c(signif(min(fakeland_annual$API)-1, 2), 8000, 160000, signif(max(fakeland_annual$API)+1, 2))
ggplot() +
geom_sf(data = fakeland_annual, aes(fill = API), colour = "grey25") + # now we need to tell ggplot an antribute of the data to use as an aesthetic
scale_fill_viridis_c(trans = "log", breaks = my_breaks) +
labs(title = "API, Fakeland 20XX", fill = "") +
coord_sf() + # deals with map projection
guides() +
theme_void() +
theme(plot.title = element_text(face = "bold", hjust = 0.5),
legend.position = "bottom", legend.key.width = unit(2,"cm"), legend.text=element_text(size=12))
Compare this to the base R map we made – our colour scale is now continuous, but otherwise the plots are almost identical.
Breaking down the code we used to produce it:
ggplot() + geom_sf(data = fakeland_annual, aes(fill = API), colour = "grey25") +
After calling ggplot()
, our first layer is a call to
geom_sf
pointing to our sf
object. We’re
making a univariate map, so our aesthetics consists only of a single
call to fill the polygons with the entries in
fakeland_annual$API
– no need for a geometry
call, here. colour
determines the colour of the polygon
outlines – we’re not varying that based on data, so it sits outside
aes
.
scale_fill_viridis_c() +
This next layer tells R to using a continuous viridis palette when
filling our polygons. When filling with discrete (factor) data, you
instead use scale_fill_viridis_d()
here. The Viridis family
of colourramps are designed to . Arguments you can pass to this function
include alpha
, to change transparency; begin
and end
to truncate the palette; direction
to
reverse it; and option
to pick another palette in the
viridis family. Four options are available: “magma” (or “A”), “inferno”
(or “B”), “plasma” (or “C”), “viridis” (or “D”, the default option used
here) and “cividis” (or “E”). By default NA
s will be shown
in grey – change this by setting na.value
to the colour of
your choice.
labs(title = "API, Fakeland 20XX", fill = "") +
It’s rare that our short variable names are explanatory enough to
show on a plot. labs
allows us to define exactly what we
want displayed, and define the plot title, subtitle, and even a caption
(e.g. for attribution). Labels of aesthetics are set by referring back
to that aesthetic: fill
, colour
,
alpha
, etc..
coord_sf() + # deals with map projection
The projection and extent of our map are controlled by this function.
In this case, it’s not doing anything – we could happily have omitted it
from our ggplot
call. This is because by default
ggplot2
will use the CRS of the first layer which defines
one – here fakeland_annual
. This can be overridden using a
crs
argument, which accepts any valid PROJ4 string.
coord_sf
is also used to change the extent of our map –
by default the whole shapefile is shown, but providing xlim
and ylim
arguements can zoom in on areas of interest.
guides() +
This slightly counter-intuitively named function sets the legend type for each aesthetic – colourbar, legend, or none. When you have multiple legends, guides can be used to reorder them, or even omit some altogether.
theme_void() +
Themes control the overall appearance of the plot.
theme_void
is an empty theme, giving us this floating
map.
theme(plot.title = element_text(face = "bold", hjust = 0.5), legend.position = "bottom", legend.key.width = unit(2,"cm"), legend.text=element_text(size=12))
Our final line makes alterations to our chosen theme – we’ve edited the typeface of the plot title and positioned it in the middle of the figure; moved the legend to the bottom and made it wider, and finally increased the font size of the legend ticks. These small touches make a big difference in the readability and visual appeal of our map.
With ggplot2
we can go much further than simply
recreating base R’s plots. The layered nature of graphical grammar makes
it easy to add additional features to our maps (points, annotations,
etc). We can also exploit it to add base layers and logos, ultimately
creating maps which rival any GIS software’s output.
Suppose we want to add the locations of health facilities in Fakeland
to our map of API. fakeland_hf_points
is an sf
object, but this time consisting of points. That doesn’t matter to
geom_sf
.
fakeland_hf_gps <- read_csv("data/fakeland_hf_gps.csv")
# Make the gps data into a point simple feature
fakeland_hf_points <- st_as_sf(fakeland_hf_gps, coords = c("X", "Y"), crs = 4326)
# Assign some of these points to be hospitals
fakeland_hospitals <- sample(fakeland_hf_points$hf, 0.1*length(fakeland_hf_points$hf))
fakeland_hf_points <- fakeland_hf_points %>%
mutate(hf_type = if_else(hf %in% fakeland_hospitals, "Hospital", "Clinic"))
ggplot() +
geom_sf(data = fakeland_annual, aes(fill = API), colour = "grey50", size = 0.2) + # now we need to tell ggplot an antribute of the data to use as an aesthetic
scico::scale_fill_scico(palette = "nuuk", trans = "log", breaks = my_breaks) +
geom_sf(data = fakeland_hf_points, aes(shape = hf_type), size = 2, colour = "grey20") +
labs(title = "API, Fakeland 20XX", fill = "API ",
shape = "Health Facility") +
coord_sf() + # deals with map projection
guides() +
theme_void() +
theme(plot.title = element_text(face = "bold", hjust = 0.5),
legend.position = "right", legend.key.width = unit(0.5,"cm"),
legend.key.height = unit(0.75,"cm"), legend.text=element_text(size=12))
Try moving the geom_sf(data = fakeland_hf_points...)
call to above the geom_sf(data = fakeland_annual...)
call:
does the map still look how you wanted it to?
To make the health-facility locations easier to see when superimposed
on our map, we’ve switched away from viridis to a palette from the
scico
package. Like viridis, all 17 palettes from scico are
perceptually uniform and colourblind safe.
For publications we often want to show some more contextual informaiton on our maps, rather than having Fakeland float in white-space. The grammar of graphics makes this easy – we can add ‘zoomed-out’ geographic information as a “base” layer, then layer our designed information on top.
Currently we only have a shapefile for Fakeland’s health districts. Let’s load in a regional Admin0 shapefile:
admin0_regional <- st_read("shapefiles/regional/fakeland_admin0_region.shp")
## Reading layer `fakeland_admin0_region' from data source
## `/Users/pamratia/Documents/ASTMH_tutorial_webpage/shapefiles/regional/fakeland_admin0_region.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 6 features and 2 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -4 ymin: 7.05596 xmax: 8 ymax: 14
## Geodetic CRS: WGS 84
If we make a quick plot of this, we can see this shapefile shows the
countries surrounding Fakeland. To check out the names of these
countries, let’s try using a geom_label
layer. To add the
labels in the right positions, we’ll need to tell R the x and y
co-ordinates of the centroids – luckily this is easy to do using
st_centroid
. Rememeber that st_centroid
assumes planar projection, rather than lat-lons. We can safely ignore
the warning here, as we’re just plotting for explanatory purposes, but
be careful!
country_central_points<- st_centroid(admin0_regional)
## Warning in st_centroid.sf(admin0_regional): st_centroid assumes attributes are
## constant over geometries of x
country_central_points <- cbind(admin0_regional, st_coordinates(st_centroid(admin0_regional$geometry)))
ggplot(admin0_regional) +
geom_sf() +
geom_text(data = country_central_points, aes(x = X, y = Y, label = cntry_n)) +
theme_void()
So we can see that Fakeland is bordered by four countries – Florin,
Bartovia, Genovia and Maldonia – with a coast to the south. We probably
don’t want to show this entire region in our base-map. We can reduce
this to our desired buffer around Fakeland’s bounding box using
coord_sf
. Alternatively, we could crop the entire regional
shapefile to our desired spatial extent.
At the same time let’s finish maping our base-map. We’ll often want
scale panels and directional arrows on our plots – these are easy to
implement by using the ggspatial
package.
if(!require(ggspatial)){
install.packages("ggspatial")
library(ggspatial)
}
ggplot(admin0_regional) +
geom_sf(fill = "antiquewhite") +
coord_sf(xlim= c(st_bbox(fakeland_annual)$xmin - .2, st_bbox(fakeland_annual)$xmax + .2),
ylim = c(st_bbox(fakeland_annual)$ymin - 1, st_bbox(fakeland_annual)$ymax + 1)) +
ggspatial::annotation_north_arrow(location = "bl",
which_north = "true",
style = north_arrow_fancy_orienteering) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
panel.grid.major = element_line(color = "aliceblue"),
panel.background = element_rect(fill = "aliceblue"),
panel.border = element_rect(colour = "black", fill=NA, size=5))
Finally, let’s pull this all together. We may wish to plot out a
number of metrics for Fakeland on this plot, so let’s save this basemap
as an R object we can call, without having to copy and paste all that
code. Note that we need to call coord_sf
after our final
data layer.
basemap <- ggplot(admin0_regional) +
geom_sf(fill = "antiquewhite", colour = "grey80") +
ggspatial::annotation_north_arrow(location = "bl",
which_north = "true",
style = north_arrow_fancy_orienteering) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
panel.grid.major = element_line(color = "aliceblue"),
panel.background = element_rect(fill = "aliceblue"),
panel.border = element_rect(colour = "grey50", fill=NA, size=5))
basemap +
geom_sf(data = fakeland_annual, aes(fill = API), colour = "grey50", size = 0.2) +
scico::scale_fill_scico(palette = "nuuk", trans = "log", breaks = my_breaks) +
labs(title = "API, Fakeland 20XX", fill = "API ",
shape = "Health Facility") +
coord_sf(xlim= c(st_bbox(fakeland_annual)$xmin - .2, st_bbox(fakeland_annual)$xmax + .2),
ylim = c(st_bbox(fakeland_annual)$ymin - 1, st_bbox(fakeland_annual)$ymax + 1)) +
theme(plot.title = element_text(face = "bold", hjust = 0.5),
legend.position = "right", legend.key.width = unit(0.5,"cm"),
legend.key.height = unit(0.75,"cm"), legend.text=element_text(size=12))