In my opinion, visualizations are always better than showing data tables. Whenever I work with spatial data, I like to plot the data on a map. There are several ways to visualize shapefile data on a map in R, I will walk you through a couple of them below.
In this tutorial I will use shapefiles to plot spatial data. A shapefile contains all the coordinates and additional data, such as region names. I’ve downloaded a shapefile of Soutch Africa, because I love this country 🙂 You can download the shapefile on http://www.gadm.org/download. This file contains all the coordinates which we are going to plot. In addition, I will also use population data from Wikipedia and coordinates of the five biggest cities of South Africa from Tageo.
Load shapefile data and plot map
So let’s start by loading the maptools library and store the location of the .shp file in a new object. Then I load the population data and coordinates of the cities into the right format. Next, I’ll read the shapefile into an object and plot the first map. In order to read and open the shapefile, I’ll use the maptools package.
library(maptools) file <- 'C:/.../ZAF_adm1.shp' # source: https://en.wikipedia.org/wiki/List_of_South_African_provinces_by_population population <- data.frame(province = c("Gauteng", "KwaZulu-Natal", "Western Cape", "Eastern Cape", "Limpopo", "Mpumalanga", "North West", "Free State", "Northern Cape"), population = c(14278669, 11074784, 6510312, 6498683, 5778442, 4444212, 3856174, 2866678, 1213996) ) # source: http://www.tageo.com/index-e-sf-cities-ZA.htm coordinates_cities <- data.frame(city = c("Cape Town", "Durban", "Johannesburg", "Pretoria", "Soweto"), lat = c(-33.930, -29.870, -26.190, -25.730, -26.280), lon = c(18.460, 30.990, 28.040, 28.220, 27.840) ) # Read data shapefile <- readShapeSpatial(file) plot(shapefile)
This simple code already produces the following map:
That wasn’t difficult at all! The shapefile includes all kinds of information. I always start to look what kind of information is included. You can use the @ and $ signs to extract objects from the data. The following examples show you which information is in the shapefile of South Africa.
# View shapefile information summary(shapefile) #Object of class SpatialPolygonsDataFrame #Coordinates: # min max #x 16.45189 32.89125 #y -34.83514 -22.12503 #Is projected: NA #proj4string : [NA] #Data attributes: # ID_0 ISO NAME_0 ID_1 NAME_1 HASC_1 CCN_1 CCA_1 TYPE_1 ENGTYPE_1 NL_NAME_1 # Min. :211 ZAF:9 South Africa:9 Min. :1 Eastern Cape :1 ZA.EC :1 Min. :0 EC :1 Provinsie:9 Province:9 NA's:9 # 1st Qu.:211 1st Qu.:3 Free State :1 ZA.FS :1 1st Qu.:0 FS :1 # Median :211 Median :5 Gauteng :1 ZA.GT :1 Median :0 GT :1 # Mean :211 Mean :5 KwaZulu-Natal:1 ZA.MP :1 Mean :0 KZN :1 # 3rd Qu.:211 3rd Qu.:7 Limpopo :1 ZA.NC :1 3rd Qu.:0 LIM :1 # Max. :211 Max. :9 Mpumalanga :1 ZA.NL :1 Max. :0 MP :1 # (Other) :3 (Other):3 (Other):3 # VARNAME_1 # Eastern Transvaal :1 # Natal and Zululand :1 # Noord-Kaap :1 # Noordelike Provinsie|Northern Transvaal|Northern Province:1 # North-West|Noordwes :1 # Oos-Kaap :1 # (Other) :3 shapefile@data # ID_0 ISO NAME_0 ID_1 NAME_1 HASC_1 CCN_1 CCA_1 TYPE_1 ENGTYPE_1 NL_NAME_1 VARNAME_1 #0 211 ZAF South Africa 1 Eastern Cape ZA.EC 0 EC Provinsie Province <NA> Oos-Kaap #1 211 ZAF South Africa 2 Free State ZA.FS 0 FS Provinsie Province <NA> Orange Free State|Vrystaat #2 211 ZAF South Africa 3 Gauteng ZA.GT 0 GT Provinsie Province <NA> Pretoria/Witwatersrand/Vaal #3 211 ZAF South Africa 4 KwaZulu-Natal ZA.NL 0 KZN Provinsie Province <NA> Natal and Zululand #4 211 ZAF South Africa 5 Limpopo ZA.NP 0 LIM Provinsie Province <NA> Noordelike Provinsie|Northern Transvaal|Northern Province #5 211 ZAF South Africa 6 Mpumalanga ZA.MP 0 MP Provinsie Province <NA> Eastern Transvaal #6 211 ZAF South Africa 7 North West ZA.NW 0 NW Provinsie Province <NA> North-West|Noordwes #7 211 ZAF South Africa 8 Northern Cape ZA.NC 0 NC Provinsie Province <NA> Noord-Kaap #8 211 ZAF South Africa 9 Western Cape ZA.WC 0 WC Provinsie Province <NA> Wes-Kaap shapefile@data$NAME_1 # Eastern Cape Free State Gauteng KwaZulu-Natal Limpopo Mpumalanga North West Northern Cape Western Cape #Levels: Eastern Cape Free State Gauteng KwaZulu-Natal Limpopo Mpumalanga North West Northern Cape Western Cape shapefile@plotOrder # 8 1 9 2 5 7 4 6 3 shapefile@bbox # min max #x 16.45189 32.89125 #y -34.83514 -22.12503
Color shapefile based on population number
Now we know what kind of information is available and how to plot the first map from the shapefile, it’s time to color the areas. I want to color the provinces according to their population intensity. The file does not include population data, sometimes shapefiles include information like this. But it’s not a problem, because we can add columns to the shapefile.
After adding the population column to the shapefile, I’m going to use the colorRampPalette() function to create HEX color coding in a new column called color . Next, I’ll tell the plot() function to apply this color and to not show the borders.
# Merge population data shapefile@data <- merge(shapefile@data, population, by.x = c("NAME_1"), by.y = c("province")) shapefile@data # Assign colors to provinces based on population population_data <- shapefile@data$population colors <- (population_data - min(population_data)) / (max(population_data) - min(population_data))*254+1 shapefile@data$color = colorRampPalette(c('#9eceff', '#004081'))(255)[colors] plot(shapefile, col = shapefile@data$color, border=NA)
That looks good but I want to add a legend with the population numbers corresponding to the colors. You can add a legend by using the legend() function. The legend needs to know which colors to plot and the corresponding values, so I’ll create bins of the population data and use the colorRampPalette() again to get the right color codes.
# Create legend library(dplyr) shapefile@data <- shapefile@data %>% arrange(population, color) shapefile@data$population_bins <- cut(shapefile@data$population/1000000, 5) shapefile@data$color_bins = colorRampPalette(c('#9eceff', '#004081'))(5)[shapefile@data$population_bins] legend("topleft", fill = unique(shapefile@data$color_bins), legend = unique(shapefile@data$population_bins), col = unique(shapefile@data$color_bins))
Now we have the colors and legend in place, I want to finish the map with the five biggest cities. We can plot the cities based on their longitude and latitude. We can use the function points() to add points to the map created with the shapefile. You can change some arguments such as the color and type of point you want to plot.
# Plot cities points(coordinates_cities$lon, coordinates_cities$lat, pch=19, col="red", cex=1)
Et voila, there we have our map of South Africa created with a shapefile. With some additional data we added the colors and points on the map.
Ggplot2 for shapefiles
But this was just a start, know I will show you how to use ggplot() to create a map with shapefiles. First, we need to activate the library ggplot2 to use the function ggplot() . We need to tell the function which shapefile we want to use, but also the longitude and latitude columns, and which column contains the regions. I’ll also add black borders and make sure that the map is plotted using the right scale.
library("ggplot2") ggplot(shapefile, aes(x = long, y = lat, group = group)) + geom_polygon(fill = "white") + geom_path(color = "black") + coord_equal()
This looks nice already, but I want to recreate the map with colors based on the population, a legend and points on the map for the five biggest cities.
In order to color the regions, we need to transform the shapefile into a data frame. But I start by reordering the shapefile data, such that the order starts at 1 again. This is something we need to do, because otherwise the coloring will go wrong. As soon as the order of the regions is back to the original order, I’m using the function fortify() to transform the shapefile to a data frame. This creates a data frame with all the coordinates on a new line. Each line/coordinate needs to know the population value, so I merge the data to add a new column containing the population. Then I create an object of the map including the colors for the regions and a legend.
# Plot map and legend with colors shapefile@data <- shapefile@data[order(shapefile@data$ID_1),] shapefile_data <- fortify(shapefile, region = "NAME_1") shapefile_data <- merge(shapefile_data, shapefile@data[, c('NAME_1', 'population')], by.x='id', by.y='NAME_1', all.x=TRUE) ggplot_southafrica <- ggplot_southafrica + geom_polygon(aes(fill = population/1000000)) + geom_path(color = "black") + scale_fill_gradient(low = '#9eceff', high = '#004081', name = "Population x 1,000,000") ggplot_southafrica
I want to end by adding some points for the five biggest cities on the map.
# Plot cities ggplot_southafrica <- ggplot_southafrica + geom_point(data = coordinates_cities, aes(x = lon, y = lat), col = "red", inherit.aes = FALSE) ggplot_southafrica
So that’s the final map! I’ve shown you two options how you could plot a map from a shapefile. There are more ways to do this, but I think these two are the best to know.