Creating Hexbin Maps in R

From CUOSGwiki
Jump to navigationJump to search

Objective

The objective of this tutorial is to create a hexbin choropleth map of U.S. education costs in RStudio. Users will learn how to create a hexbin map from a geospatial object and plot thematic data. In addition, this tutorial will demonstrate how to add and customize various cartographic elements to your map including symbolization, labelling, and map elements (e.g., title, legend). Users will have the opportunity to become more familiar with the R programming language as well as explore the spatial and cartographic capabilites of the software. This tutorial uses open-source software and data and will discuss the advantages and limitations of each. Finally, this tutorial contributes to the collection of Open Source GIS tutorials created by students at Carleton University.

Software Requirements: R, RStudio, spreadsheet software (e.g., Microsoft Excel)

Skills:

Note: This tutorial assumes basic knowledge of the R programming language. This version of the tutorial was created using a Windows platform with R version 3.6.2.

Why Hexagons?

Regularly shaped grids are often used to normalize geography for mapping in instances where polygons are irregularly shaped (i.e., political boundaries). A hexagon grid is an alternative to the square (fishnet) grid typically used in GIS analysis and thematic mapping. Aggregating data into hexagons is advantageous as the edge effects of the grid shape reduce sampling bias. In addition, hexagons can be used to obscure sensitive source data (e.g., personal addresses).

Getting Started

Downloading the Software

The first step of this tutorial is downloading R and RStudio if they are not already installed on your device. R version 4.1.2 is the latest version of the software released in 2021. R is a widely used open-source software environment used for data manipulation and analysis (statistics, graphics, etc.). R is easily customizable and is executed line by line in a console.

For the purposes of this tutorial, we will be using an integrated development environment (IDE) called RStudio. This software provides users with a console, a syntax-highlighting editor, and a set of integrated tools for plotting and debugging R code. RStudio is available in open source (free) and commercial ($995/year) editions, and has both a desktop and server version. This tutorial only requires the open source edition.

The desktop version of RStudio can be downloaded HERE

Finding Data

Spatial data

When creating a hexagonal map, users have the option to create a hexbin map from (1) a geospatial object or (2) a list of coordinates. For the purposes of this tutorial, we will use an existing hexagon boundary file (.geojson) of the United States. Download the data in .geojson format and save to a new project folder.

The hexgrid is available to download HERE.

Non-spatial data

The statistical data for this tutorial will be sourced from the United States (U.S.) National Science Board. The data of interest is the state-level “Average Undergraduate Charge at Public 4-Year Institutions” from 1994 to 2019. The charge includes the tuition, required fees, room, and board for a full-time undergraduate student who is a state resident. This data serves as a useful indicator of the accessibility of higher education. Spend some time exploring the data and taking note of any observable trends.

The data is available to view and download HERE.

Cleaning the Data

Before importing the attribute data into R, we will need to clean the data and save it as a text file.

  1. Open the downloaded data file (ave-undergraduate-charge-at-public-4-year-institutions.xlsx) in a spreadsheet software (E.g., Microsoft Excel)
  2. Simplify the data by deleting all unnecessary rows
  3. Remove commas by highlighting the spreadsheet and changing the format to general
  4. Save the spreadsheet as a comma-seperated values (.csv) file

Navigating RStudio

The RStudio workspace consists of four panes:

  1. (Top left): This is the code editor
  2. (Bottom left): This is the R console
  3. (Top right):
  4. (Bottom right):

At the start of your session:

  • Start a new R script by clicking on the icon in the top left corner of the script window or go to File > New > R script.
  • Save your script to your project folder by clicking the 'save' icon or go to File > Save As.
  • Set your working directory (the folder where R reads and saves files).

setwd("~/FALL 2021/GEOM 4008/Data")

Creating a Choropleth Map

Installing Packages

R comes with a number of pre-installed packages. For this tutorial, we will also need to download additional packages. Click the link beside each package listed below to learn more. Install the required packages using the below script. Once installed, we will call each of the pacakges from our library. Alternatively, you can use the 'Install Packages' dialog box which is accessible through the main menu (Tools > Install Packages).

#Install required packages for this tutorial
install.package('tidyverse')
install.package('broom')
install.package('rgeos')
install.package('geojsonio')
install.package('RColorBrewer')

#Load the packages
library(tidyverse)
library(broom)

Tip: If you want to learn more about a specific package or function, you can write a command in the R console to the view the corresponding 'Help' page.

?mutate

Importing Data

Before reading your data into R, ensure that your data are located in the folder that you set as your working directory.

To import the hexbin data, we will use the geojson_read() function. The data will now appear in the 'Global Environment' tab in the top-right pane. After the file has been imported, we will need to reformat our data using the mutate and gsub functions and fortify it using the tidy function in order to plot our map in the next steps.

#Import hexbins
hex <- geojson_read("us_states_hexgrid.geojson", what = "sp")

#Reformat the 'google_name' field
#This will remove the (United States) from each value
#E.g., Vermont (United States) will be changed to Vermont 
hex@data = hex@data %>% mutate(google_name = gsub(" \\(United States\\)", "", google_name))

#Fortify the data to create a data format output
#This format is needed to plot the map using the ggplot2 package
hex_fortify <- tidy(hex, region = "google_name")

Next, we will import the education data using the read.csv function. Once imported, we can view the data table and delete any unnecessary rows.

#Import education data
undergrad <- read.csv("mydata.csv", header=T)
#Remove unnecessary rows
#If the rows are not consecutive, seperate with a comma
undergrad <-undergrad[-c(53:64), ]

Merging Data Frames

To create our choropleth map, we will need to join our spatial and non-spatial data together. To perform the join, we can write a command specifying the two fields that will be used. Using the fortified hexbin data, we will join the field "id" with the "state" field from the data table.

#Perform spatial join 
hex_fortify <- hex_fortify %>% 
       left_join(. , undergrad, by=c("id"="State"))

Symbolizing Data

To symbolize the attribute data in our map, we will use graduated colour symbology.

#Create bins
hex_fortify$bin <- cut( hex_fortify$X2019 , breaks=c(), labels=c(), include.lowest=TRUE )
#Select a colour scale using ColorBrewer
my_palette <- 

#plot 
ggplot() +
  geom_polygon(data=hex_fortify, aes(fill=bin, x=long, x=lat, group=group), size=0, alpha=0.9) +
  theme_void() +
  scale_fill_manual(
    values=my_palette, 
    name="Undergraduate charge ($)",
    guide= guide_legend( keyheight=unit(3, units="mm"), keywidth=unit(12, units="mm"), label.position="bottom", title.position="top", nrow=1)
  ) +
  ggtitle( "Undergraduate charge at 4-year public univerisities in 2019" ) +
  theme(
    legend.position = c(0.5, 0.9),
    text = element_text(color = "#22211d"),
    plot.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.background = element_rect(fill = "#f5f5f2", color = NA), 
    legend.background = element_rect(fill = "#f5f5f2", color = NA),
    plot.title = element_text(size=22, hjust=0.5, color = "#4e4d47"),
  )

Adding Map Elements

Labels can be added to our map to provide viewers with geographic reference information. This is especially important on our hexbin map where the U.S. states boundaries are not shown as they would appear on a political map. To add labels to the hexbin map, we must first calculate the centroid of each hexagon using the gCentroid function. We will use the two-letter state abbreviations in the "id" field. The labels will be added to the plot and we are then able to change their colour and size.

Conclusion

References

Holtz, Yan. (n.d.). Hexbin map in R: an example with US states. https://www.r-graph-gallery.com/328-hexbin-map-of-the-usa.html

https://team.carto.com/u/andrew/tables/andrew.us_states_hexgrid/public/map

https://ncses.nsf.gov/indicators/states/indicator/ave-undergraduate-charge-at-public-4-year-institutions

Esri. (2015, April 8). Thematic mapping with hexagons. https://www.esri.com/about/newsroom/insider/thematic-mapping-with-hexagons/