Difference between revisions of "Creating Hexbin Maps in R"
Line 52: | Line 52: | ||
The RStudio workspace consists of four panes: |
The RStudio workspace consists of four panes: |
||
# (Top left): '''Script window''' (Open multiple scripts; Write and run your code) |
# (Top left): '''Script window''' (Open multiple scripts; Write and run your code) |
||
− | # (Bottom left): '''R console''' |
+ | # (Bottom left): '''R console''' (View error messages) |
# (Top right): '''Environment''' (List of your created variables); '''History''' |
# (Top right): '''Environment''' (List of your created variables); '''History''' |
||
# (Bottom right): '''Plots''' (Graph outputs); '''Packages''' (List of available packages); '''Help''' |
# (Bottom right): '''Plots''' (Graph outputs); '''Packages''' (List of available packages); '''Help''' |
||
Line 66: | Line 66: | ||
*Comment your code using the (#) symbol |
*Comment your code using the (#) symbol |
||
− | < |
+ | <syntaxhighlight lang="R">setwd("~/FALL 2021/GEOM 4008/Data") |
+ | |||
+ | #4008 Tutorial |
||
+ | #How to Create a Hexbin Map |
||
+ | #December 2021</syntaxhighlight> |
||
== Creating a Choropleth Map == |
== Creating a Choropleth Map == |
Revision as of 17:35, 9 December 2021
Contents
Objective
The objective of this tutorial is to create a thematic hexbin map of U.S. higher education in RStudio. Users will learn how to create a hexbin map from a geospatial object and plot thematic data. In addition, this tutorial will demonstrate how to add and customize various cartographic elements to your map including symbolization, labelling, and map elements (e.g., title, legend). Users will have the opportunity to become more familiar with the R programming language as well as explore the spatial and cartographic capabilites of the software. This tutorial uses open-source software and data and will discuss the advantages and limitations of each. Finally, this tutorial contributes to the collection of Open Source GIS tutorials created by students at Carleton University.
Software Requirements: R, RStudio, spreadsheet software (e.g., Microsoft Excel)
Skills: Cartographic design, programming
Note: This tutorial assumes basic knowledge of the R programming language. This version of the tutorial was created using a Windows platform with R version 3.6.2.
Why Hexagons?
Regularly shaped grids are often used to normalize geography for mapping in instances where polygons are irregularly shaped (i.e., political boundaries). A hexagon grid is an alternative to the square (fishnet) grid typically used in GIS analysis and thematic mapping. Aggregating data into hexagons is advantageous as the edge effects of the grid shape reduce sampling bias. In addition, hexagons can be used to obscure sensitive source data (e.g., personal addresses).
Getting Started
Downloading the Software
The first step of this tutorial is downloading R and RStudio if they are not already installed on your device. R version 4.1.2 is the latest version of the software released in 2021. R is a widely used open-source software environment used for data manipulation and analysis (statistics, graphics, etc.). R is easily customizable and is executed line by line in a console.
For the purposes of this tutorial, we will be using an integrated development environment (IDE) called RStudio. This software provides users with a console, a syntax-highlighting editor, and a set of integrated tools for plotting and debugging R code. RStudio is available in open source (free) and commercial ($995/year) editions, and has both a desktop and server version. This tutorial only requires the open source edition.
The desktop version of RStudio can be downloaded HERE
Finding Data
Spatial data
When creating a hexagonal map, users have the option to create a hexbin map from (1) a geospatial object or (2) a list of coordinates. For the purposes of this tutorial, we will use an existing hexagon boundary file (.geojson) of the United States. The file includes a total of fifty-one hexagons (50 U.S. States and District of Columbia). Download the data in .geojson format and save to a new project folder.
The hexgrid is available to download HERE.
Non-spatial data
The statistical data for this tutorial will be sourced from the United States (U.S.) Department of Education, National Center for Education Statistics. The data of interest is the state-level “Bachelor's Degrees in Science and Engineering Conferred per 1,000 Individuals 18-24 Years Old” from 2000 to 2019. This data is an important indicator of higher education attainment and bachelor's-level training. According to the U.S. Department of Education, Science and Engineering (S&E) fields include physical, life, earth, ocean, atmospheric, computer, and social sciences; mathematics; engineering; and psychology (excludes medical and technology fields). Spend some time exploring the data and taking note of any observable trends.
The data is available to view and download HERE.
Cleaning the Data
Before importing the attribute data into R, we will need to clean the data and save it as a text file. The data provided includes three tables: (1) the number of S&E Bachelor's Degrees conferred, (2) the population of cohort 18-24 years old, and (3) the number of degrees conferred per 1,000 individuals. For our choropleth map, we are most interested in the third table because the values have been normalized. Following the instructions below, we will prepare the data for import.
- Open the downloaded data file (se-bachelors-degrees-per-1000-18-24-year-olds.xlsx) in a spreadsheet software (E.g., Microsoft Excel)
- Select and delete the 'S&E bachelor's degrees' and 'Individuals 18-24 years old' tables
- Select the 'Degrees/1,000 individuals 18–24 years old' table
- Click on the column border, hold down the shift key and drag across
- Simplify the data by deleting all unnecessary rows
- Save the spreadsheet to your directory folder as a comma-seperated values (.csv) file
Figure 2:
The RStudio workspace consists of four panes:
- (Top left): Script window (Open multiple scripts; Write and run your code)
- (Bottom left): R console (View error messages)
- (Top right): Environment (List of your created variables); History
- (Bottom right): Plots (Graph outputs); Packages (List of available packages); Help
Figure 3: Screenshot of the RStudio Environment
At the start of your session:
- Start a new R script by clicking on the icon in the top left corner of the script window or go to File > New > R script.
- Save your script to your project folder by clicking the 'save' icon or go to File > Save As.
- Set your working directory (the folder where R reads and saves files).
- Comment your code using the (#) symbol
setwd("~/FALL 2021/GEOM 4008/Data")
#4008 Tutorial
#How to Create a Hexbin Map
#December 2021
Creating a Choropleth Map
Installing Packages
R comes with a number of pre-installed packages. For this tutorial, we will also need to download additional packages. Click the link beside each package listed below to learn more. Install the required packages using the below script. Once installed, we will call each of the pacakges from our library. Alternatively, you can use the 'Install Packages' dialog box which is accessible through the main menu (Tools > Install Packages).
- tidyverse Learn more
- geojsonio Learn more
- RColorBrewer
- sp
- broom Learn more
- rgeos Learn more
#Install required packages for this tutorial
install.package('tidyverse')
install.package('geojsonio')
install.package('RColorBrewer')
install.package('sp')
install.package('broom')
install.package('rgeos')
#Load the packages
library(tidyverse)
library(geojsonio)
library(RColorBrewer)
library(sp)
library(broom)
library(rgeos)
Tip: If you want to learn more about a specific package or function, you can write a command in the R console to the view the corresponding 'Help' page.
?mutate
Importing Data
Before reading your data into R, ensure that your data are located in the folder that you set as your working directory.
To import the hexbin data, we will use the geojson_read()
function. The data will now appear in the 'Global Environment' tab in the top-right pane. After the file has been imported, we will need to reformat our data using the mutate
and gsub
functions and fortify it using the tidy
function in order to plot our map in the next steps.
#Import hexbins
hex <- geojson_read("us_states_hexgrid.geojson", what = "sp")
#Reformat the 'google_name' field
#This will remove the (United States) from each value
#E.g., Vermont (United States) will be changed to Vermont
hex@data = hex@data %>% mutate(google_name = gsub(" \\(United States\\)", "", google_name))
#Fortify the data to create a data format output
#This format is needed to plot the map using the ggplot2 package
hex_fortify <- tidy(hex, region = "google_name")
#Plot the hexbins
ggplot () +
geom_polygon(data = hex_fortify, aes( x = long, y = lat, group = group), fill="#a1dab4", color="#f7f7f7") +
geom_text () +
theme_void () +
coord_map ()
Figure 4.
Next, we will import the education data using the read.csv
function. Once imported, we can view the data table and delete any unnecessary rows that contain NA values.
#Import education data
bach <- read.csv("bachelor.csv", header=T)
#Remove unnecessary rows and columns
bach <-bach[-c(53:54), ] #rows
bach <-bach[-c(22:63)] #columns
#View the data table
view(bach)
Merging Data Frames
To create our choropleth map, we need to join our spatial and non-spatial data together. Using the left_join
function, we will perform the join by merging our data frames. The two fields that we will use to join the data are "id" and "state".
#Perform spatial join
hex_fortify <- hex_fortify %>%
left_join(. , bach, by=c("id"="State"))
Symbolizing Data
Choropleth maps are a common type of thematic map that portray geographic patterns for areal units such as states and provinces. They generally consist of two to six colour symbols which represent a corresponding number of nonoverlapping classes for an intensity index (Monmonier, 2018).
To symbolize the attribute data in our choropleth map, we will use graduated colour symbology. By varying the colours of the hexagons, our map will show the quantiative difference in the number of Bachelor's Degrees awarded in Science and Engineering (per 1,000 indiviuduals) in the age cohort of 18-24 between the U.S. states. We will classify the data into ranges and assign a specific colour to each of the classes. When selecting the size and total number of classes, it is important to explore the descriptive statistics of our data.
The ColorBrewer tool can be explored more HERE
#Create bins
hex_fortify$bin <- cut( hex_fortify$X2019 , breaks=c(), labels=c(), include.lowest=TRUE )
#Select a color ramp
#Display all ColorBrewer palettes
#Use an argument to only display colorblind-friendly palettes
display.brewer.all(colorblindFriendly = TRUE)
#Choose a sequential ramp for our map
#Where n = number of data classes
my_palette <- brewer.pal(n=5, name="OrRd")
Figure 4. Colourblind-friendly Brewer palettes.
Adding Map Elements
Labels can be added to our map to provide viewers with geographic reference information. This is especially important on our hexbin map where the U.S. states boundaries are not shown as they would appear on a political map. To add labels to the hexbin map, we must first calculate the centroid of each hexagon using the gCentroid
function. We will use the two-letter state abbreviations in the "id" field. The labels will be added to the plot using the geom_text
function. The colour and size of the labels can also be changed.
A legend can be also created to help the viewer understand what each colour symbol represents.
#Add labels
centers <- cbind.data.frame(data.frame(gCentroid(hex, byid=TRUE), id=hex@data$iso3166_2))
#plot
ggplot() +
geom_polygon(data=hex_fortify, aes(fill=bin, x=long, x=lat, group=group), size=0, alpha=0.9, color="#f7f7f7) +
geom_text(data=centers, aes(x=x, y=y, label=id), color="white", size=5) +
theme_void() +
scale_fill_manual(
values=my_palette,
name="S&E Bachelor's Degrees per 1,000 Individuals (2019)",
guide= guide_legend( keyheight=unit(4, units="mm"), keywidth=unit(10, units="mm"), direction="horizontal", label.position="bottom", title.position="top", nrow=1)
) +
ggtitle( "Undergraduate charge at 4-year public univerisities in 2019" ) +
theme(
legend.position = c(0.5, 0.9),
text = element_text(color = "#22211d"),
plot.background = element_rect(fill = "#f5f5f2", color = NA),
panel.background = element_rect(fill = "#f5f5f2", color = NA),
legend.background = element_rect(fill = "#f5f5f2", color = NA),
plot.title = element_text(size=22, hjust=0.5, color = "#4e4d47"),
)
Conclusion
References
Esri. (2015, April 8). Thematic mapping with hexagons. https://www.esri.com/about/newsroom/insider/thematic-mapping-with-hexagons/
Holtz, Yan. (n.d.). Hexbin map in R: an example with US states. https://www.r-graph-gallery.com/328-hexbin-map-of-the-usa.html
https://team.carto.com/u/andrew/tables/andrew.us_states_hexgrid/public/map
Monmonier, M. (2018). How to lie with maps. (3rd ed.). The University of Chicago Press.
National Science Board. (2021). https://ncses.nsf.gov/indicators/states/indicator/se-bachelors-degrees-per-1000-18-24-year-olds