In this session students will learn how to manage and present spatial data using Geographical Information System (GIS) software. Session 1 introduces students to QGIS with session 2 replicating many of the exercises in R. Both QGIS and R software are free and open-source.
QGIS is a popular alternative to paid GIS software and provides an interface with other open source GIS such as SAGA and GRASS. It has a large user base and extensive support options with easily accessible tutorials, user guides and forums. Similar to R there are numerous volunteers contributing to the development of QGIS with “Plugins” available for tools to improve functionality. GIS specific software such as QGIS predominantly use a graphical user interface (GUI) - aka pointing and clicking - and so are thought to be more user friendly. While a little advanced for this module it should be noted that you can run R commands in QGIS and also access QGIS processing functions from R.
This tutorial will give step by step instructions that you can use to follow along with the instructor.
On opening QGIS from your computer familiarize yourself with the QGIS interface. Hovering your mouse over symbols will give additional information for most items.
Panels can be added and removed from the drop down menu bar > View > Panels
Shapefiles are a common file format for vector data and are comprised of multiple files; three that are mandatory and many that are optional:
The shapefiles that we will be using for this session are the Fakeland health districts (FAK_HDs) and can be found following folder: edit when location known. Usb or … /MAP Spatial Analysis Course/ASTMH/shapefiles/ . We will now learn how to add the shapefiles to GIS, explore the data they contain and join them to non-spatial data.
A detailed description of shapefile components can be found here
There are multiple ways of adding layers to your QGIS project. The numbered bullet points below outline a few options. Try them out after class and see which you prefer. For now we will follow option 1. Note that you should always add the .shp file of the shapefile components.
You may have noticed in the left hand panel that there are many different types of data that can be added to the project. We use “Add Vector Layer” this is for lines and polygons. Later in the tutorial we will also add point data as well as non-geographic tabular data (Delimited text) and raster data.
Try these options after class:
You should now see the Fakeland health districts on your map canvas with each district boundary being represented by a different polygon. Note that when adding layers QGIS will randomly assign colors and your map will be a different color. We will show you how this can be changed shortly but first you can explore the shapefile.
Information regarding the features can be found in the attribute table. The attribute table is a tabular data set containing all the non-geographical information about each feature. In this example each health district is a “feature” (represented by rows in the table) and they have a number of attributes (columns) or variables linked to that health district. Currently the attribute table contains basic information on the names of the health districts and what regions they are in - but attribute tables can also store numerical data needed for mapping populations and disease burden.
In the previous example we selected features based on a single condition (Adm1). However it is possible to select based on multiple conditions. There are many available tools in the attribute table - for example it is possible to edit the data here and produce new fields with the field calculator. We will not go through examples here but for further reading please see: here
The selection of features is a key skill when working in QGIS - it is essentially used to subset data. For most operations in QGIS there will be a “Use only selected features” tickbox that will apply the operation to your selected subset. Note - there are also certain processing tools that will only apply to the selected features (and apply to all features if none selected) so you have to be aware of your current selection. The toolbar provides a variety of options for selection:
Point data is a very common form of data used in health surveillance and research. In the following example the points represent individual health facilities. In contrast to the polygon shapefiles we explored above, point data can be stored in a single spreadsheet file that contains latitude and longitude coordinates for each point. Note that QGIS can only read delimited text files - the most common form being comma-separated vales (.CSV) files. If you are working with an excel spreadsheet you must first save it as a cvs before you can load it into QGIS. Further note that Mac users may need to specifically save it as a Windows CSV file
You will now see the points added to map canvas and Each point is a different health facility. Like with the polygon shapefile we can open the attribute table to see what data is available for each health facility.
It is also possible to add labels to the map.
If you are new to GIS you may be wondering why we keep referring to the data as “Layers”. Maps are often built up from several data sources and types and they have to be layered/stacked in a certain way so that each component can be visualized.
In QGIS the order you see the layers in the layers panel represents the order you will see them on the map canvas (the layers at the top of the list you will see first). You can rearrange the order of the layers by clicking and dragging.
You can also turn on/off the layers using the tickbox next to each layer - try un-selecting the points layer.
You will notice that the point data must be on top to the polygon data for it to be visible. Currently we only have two visible layers but maps may contain many layers if there are new shapefiles for roads, waterways, railways, buildings etc.
Layer demonstration - adding admin1, admin0 and lines
In the animation below you will see that there are several layers in the layer panel. Initially the FAK_roads and the hf_gps points are below the FAK_HDs - they have to be moved above the FAK_HDs layer to be visible. Note also that the layers are being turned on with the tickboxes in the layers panel
knitr::include_graphics("images/vid1.gif")
It is important to note that QGIS does not store any data and so when reopening a project QGIS follows the file paths to the data you provided earlier. If the location of the shapefiles and/or data changes QGIS will be unable to open them. QGIS will maintain any changes to symbology.
The project can be reopened from the folder where it was saved or alternatively simply open QGIS and you will see links to previous QGIS projects
We regularly work with data that is yet to be linked with a geometry, for example, a table of data containing the number of malaria cases per administrative unit. While this data contains a name for a geographic location it cannot be mapped alone - it needs to be joined with existing shapefiles of those administration units.
To join the two data types of data they must have a key shared value (e.g. Admin unit name) and we can conduct what is called an “Attribute Join”. We will walk through this below.
An alternative join is the “Spatial join” and can be conducted if both data types are geolocated (for example the tabular data may have GPS locations). This means that a key variable isn’t necessary.
You will now see that annual_admin_data has been added to the “Layers panel”. Take a look at the data by opening the attribute table . This is an example malaria surveillance dataset. For each Admin 2 unit of Fakeland, it outlines the number of malaria tests and confirmed positive results. The results are broken down by type of test (RDT or Microscopy) and age range of the patients under 5yrs (u5) or over 5yrs (ov5)
Currently the data is not associated with any geometry so it will not be displayed on the map canvas. We will now join it to the existing Fakeland admin units.
We have tabular data with no geometry we will have to use attribute join to link it to the existing health district shapefiles. This process will take the tabular data and add it to the attribute table of the shapefile. Before we go onto show you how we do this in practice can you identify the key attributes that we will use to join the data? What attributes do the tabular data and shapefiles have in common?
Now open the attribute table of the FAK_HDs layer - what do you see? The annual_admin_data malaria data are now included in the FAK_HDs shapefile.
Data sets and shapefiles can then be joined based on attributes such as the name of the administration unit. Note that joining by name can commonly cause errors because if there are differences in spelling, use of capitals or punctuation the names will not be matched - they must be exactly the same. Data must be clean beforehand but it is also common to join by standardized administrative codes that are less prone to error.
We will now see how to load raster data. We will be using a population raster that gives an all age population count for each 5x5km pixel.
The population raster will appear on the map canvas. It may have covered up the shapefiles - for now this is fine, we can rearrange the layers once we have finished working with the raster.
By clicking on the small arrow next to the raster on the layer panel you can see the minimum value and maximum values. What can we say about the distribution of the population from this raster? Where are the areas of high and low population density?
Unlike shapefiles there is no attribute table connected to a raster. You can identify the value of an individual raster pixel by clicking the identification icon and selecting an area of the raster. Currently it is difficult to identify variation in population as most of the raster looks black with a few lighter coloured pixels. We can modify the appearance by adjusting the symbology.
The steps to change symbology in a raster layer are different from those explained for vector layer.
Why do you think we selected “Quantile” instead of “Continuous” from the mode drop-down? How has this changed the appearance of the raster? Re-open the Sybmology panel and adjust the number of classes - how does this change the appearance of the map. Note that you can manually adjust the values associated with each level if needed.
We now have a layer of administrative units and a population raster. Using these two pieces we can calculate the population in each administrative unit - this is know as calculating the zonal statistics.
We now have population counts and confirmed malaria for each administration unit. Using these two pieces of information we can calculate the Annual Parasite Index (API) that will give a more standardised comparison between administration units. API (per 1000 people) = Number of cases / population * 1000
We will now explore some ways to visualize the data that has been joined.
We can now see each Admin 2 unit has been colored based on the settings we just established.
This session takes you through how to generate and export maps for reporting, publications and presentations. QGIS uses the print layout tool (know as print composer in earlier QGIS versions).
You may now have many layers in your QGIS project. We want to have layers we want to visualize to the front or turn off the unwanted layers. We focus will focus on the API map
From the menu bar click on “Project” and then click “New Print Layout”.
From the pop-up window you can name the map or click OK and an automatic name will be created. The print composer window will then open.
From the add item menu select “Add map” menu or use the symbol on the left pane to add map.
Then left-click on the top left corner, hold and drag to the bottom right corner. Your map will appear in the box you created.
If you want to move the map click the move icon on the left . This will allow you to move the map up/down/left and right but also zoom in and out. You should not confuse this with the zoom icon which changes you view of the map but does not actually change the scale of the map on the print composer. If you want a specific scale or extent you can enter specific settings in the item properties tab on the right
Our map is looking a little bare - we can now add features that will give the map some context. Let’s first add a legend. Click on the legend icon on the left and then click and drag on the map where you want the legend.
You will see all the items from the Layers panel in the legend - even those that are not visible on the map! We have to clean up the legend and remove labels. In the “Items list” make sure the legend is selected. From there go to the item properties and tick the check box “Only show items inside linked map”. This should reduce the legend down to the layers you can see.
The legend uses the names of layers but these are not always informative. You can change these simply by double clicking on the label in the “Legend Items” panel. Try changing the FAK_HDs label to something like “Number of confirmed malaria cases”
It should be noted that if you change the layers in the main QGIS project it will be reflected in the print composer. Let’s try this out - return to the main QGIS project window (do not close the print window).
Change the color scale of the the FAK_HDs layer then return to the print window. Has the map changed? It may take a moment or you may need to move the map slightly with so that it updates
The map is starting to look better - try adding a North Arrow and a Scale Bar . Each item on the map has customization options that can be accessed from the “Item properties” tab - try a few of them out.
Have you added a scale bar? Try to zoom in or out of the map with and see what happens to the scale bar.
We can also add a text box for titles or labeling. Text, font size and style are all adjusted in the “item properties” panel.
Once you are happy with an item it is advisable to lock it into place. This will stop any accidental moving/zooming. From the items list - check the box next to “Map 1” with the little lock symbol.
The previous step is mainly used to prevent unwanted movement. We can also lock-in the color symbology so that if you make changes to the main map canvas the print layout remains the same. In the “Item properties” of the “Map 1” check the tickbox for “lock layers” and “lock styles for layers”.
Return to the main QGIS project and change the color scheme again. What has happened in the print composer now?
There are many different items that can be added to your map such as arrows and tables - find out more here
The final step is exporting the map so that it can be used in a report or publication.
There are many very useful sources of help for QGIS. A good place to start is the official QGIS documentation where you will find a detailed description of QGIS and be able to search topics that you may need help in. The training manual that includes easy to follow lessons
https://docs.qgis.org/3.22/en/docs/ https://docs.qgis.org/3.22/en/docs/training_manual/foreword/index.html
A series of tutorials and tips have been put together by contributors on the following website. https://www.qgistutorials.com/en/. The tutorials have been translated to many different languages.
If either of the these two options can not help you can ask specific questions on the https://gis.stackexchange.com/
We are nearing the end of the tutorial - below are a few optional extras you can try if you have time or want to do further study. The options are:
There are times we may identify errors in the attribute table and may need to correct them. There may also be unwanted columns that you want to tidy up. In the FAK_HDs_with_pop shapefields we do not have a use for the Pop_count and it can get confused with Pop_sum so it may be best to remove it.
QGIS has many other data cleaning and manipulation functions that can be explored. That said it is not a strength of QGIS and so it can be preferable to clean data in excel or R before using in QGIS.
There are times we may want to know the how many points are in each polygon. In our example it will tell us how many health facilities are in each District but there are many other different use cases.
An extensive list of free spatial data can be found on the Free GIS Data website. Please explore and see if there is anything relevant to your work.
As an example, we can download shapefiles of the Philippines boundaries, roads, rail network and rivers.
From the Free GIS Data website we can also obtain population rasters
QGIS plugins offer additional functionality to the core QGIS software - much like packages in R. Many popular plugins have been encorporated into QGIS but there are still over 1000 additional plugins available. Here we will look at how we can access the plugins and highlight some of the more popular plugins.