R Studio's Spatial Capabilities
Contents
Purpose
This tutorial prominently centers on the abilities of R Studio; an open source software program utilizing the programming language 'R' which is often used for data analysis, statistics and the graphing of very large data sets. Users may want to view spatial output of their data and R has a package (sp) that allows for the manipulation of spatial data. If users are not familiar with the package, then they may want to import the data to more visual software such as QGIS. QGIS uses the scripting language Python. In order for R and QGIS to communicate the scripting language must be translated.
This tutorial shows how one could manually translate Python to R language - this required knowledge of how both programs operate, and how to use spatial packages in R Studio.
Introduction
How to use R Studio:
Figure 1: The top left-hand box (1) is the script editor. This is where one inputs all their commands; how to manipulate the data, perform analysis, and create graphical outputs. The bottom left-hand box (2) is the console. This is where commands can be written, but these will not be saved if the GUI (Graphical User Interface) closes, and results are shown. When one runs a command in R Studio, the command along with its output is shown in this section, unless it is a plot. The bottom right-hand box (3) is where graphical outputs are shown, and where help files, packages and data files can be found. The top right-hand box (4) shows datasets that have been imported and keeps track of data and values that have been assigned in your R scripts. One can click on the datasets to view them in a table format in another window.
In the command console one can simply type the command and press 'Enter' and the command will go through right away. In the script editor one can type their command or as many as they wish, and then run it by selecting the line(s) they wish to run and clicking 'Run' - a button located in the upper right-hand corner of the script editor (As can be shown below in Figure 2.)
Often one has large datasets that they wish to analyze. The datasets can be imported via text and .csv files, either by selecting "Import Dataset" on the R Studio GUI or by entering a command line such as: insertnamefordata <- read.csv("/User/Documents/YOURDATA.csv"). Entry by command line requires one to know the exact path to the dataset.
As shown in Figure 4, one can decide what to name the dataset there importing and how to arrange it in the most suitable way. Generally one renames their data, adds headers, separates by comma, and leaves the rest as default.
One should also set a 'working directory'; this is a folder where all data can be located and where the '.R' file will be saved. This can be done by clicking 'Session' in the top tool bar of the R Studio GUI, selecting 'Set Working Directory' and then 'Choose Working Directory' and navigating to the correct folder, as shown in Figure 5.
For data analysis, one can import a script that someone has already created, or import one that they were previously working on. This is done in the top left-hand corner with the open folder icon (See Figure 6). Then one can rename the data according to how they imported their own. One would then have to go through the pre-made script and insert the proper name for the data accordingly. Another method is to write the script yourself depending on how much you know about the language and want to customize the process for a specific purpose.
Methods
A Comparison
Commands in Python
Commands in R Studio in order to achieve the same result:
A useful resource for translating between languages is this website: http://mathesaurus.sourceforge.net/r-numpy.html
Spatial R Packages
Packages can be installed through the GUI (See Figures 10-11) or command line (See Figure 12).
Then in the script editor or command line one must type 'library(InsertPackageName)' in order to call on the package (Figure 13).
R has packages 'maptools' and 'sp' that allow for manipulation of shapefiles. These packages allow shapefiles (lines, points, polygons and grids) to be read into R and displayed visually in the bottom right-hand corner of the GUI under 'plots'. The data used for this tutorial is Ottawa Census Data from 2006 and 2016 that can be found on the Carleton Library website Ottawa Gatineau Census Data.
The following tutorial of 'Maptools' uses a modified version of Murray Richardson's code and explanation on how to use the tool for spatial manipulation, from his GEOG 3003 course.
Assessing Temporal Change from 2006 to 2016
After following the above directions to achieve the same result as in Figure 15, we will now use another open source software; QGIS (Quantum GIS) in conjunction with R to evaluate the temporal change in Ottawa Wards from 2006-2016. The first step is to head to the QGIS website and follow the provided instructions to get the proper version downloaded and setup. Next, using the link provided earlier download both the 2006 and 2016 census data from the Carleton Library site. Now, you should be able to add the 2006/2016 census shapefiles to QGIS, as shown in Figure 16.
Next you need to use the 'Merge Vector Layers' tool, to merge the 2006 and 2016 layers. This tool can be found in: 'Vector <- Data Management Tools <- Merge Vector Layers', as shown in Figure 17.
Then you need select your 'inputs' (the files you want to merge), so in this case it would be the 2006 and 2016 census data (See Figure 18).
After the tool has been ran, you will end up with a file called 'Merged' (unless another name is specified). This file is a temporary layer, meaning if you close QGIS before you save it or make it a permanent layer it will be deleted. To do this, simply 'right click' on the file and select 'Make Permanent', as demonstrated in Figure 19.
Results
A Comparison
Exact translation between the two languages is almost possible. The biggest issue found is that R did not arrange the XY points as coordinates like they were meant to be. However the results calculated were the same in the end despite R being unable to pair them. When the standard deviation and variance were calculated for the X and Y columns the values were different because in Python there is 1 degree of freedom accounted for. There must be a function in R that allows one to designate the desired degrees of freedom, so finding a solution should not be too difficult.
Spatial R Packages
This tutorial showed a sample of how one can manipulate shapefiles in R without having to export to another software to see the visual representation. The main commands to remember are to concatenate the newly developed or adjusted data to the original shapefile so that it can be re-written.
Example:
• classes<-cut(OttawaArea$Shape_area, breaks=breakvals, labels=labs) <-- This is the new data slot
• (OttawaArea,"data")=newdata3 <-- This adds the new data into the main dataframe
• writePolyShape(OttawaArea, "OttawaCTs2006_updated.shp") <-- This command re-writes the shapefile to include the latest updates
Another key feature in R Studio is that In the bottom right-hand quadrant where the plots are shown, one can export the images as a .JPEG or .PDF. This is very useful for reports.
Assessing Temporal Change from 2006 to 2016
This tutorial showed how using QGIS in conjunction with R can be used to display temporal changes.
References
Acknowledgements
Dan Patterson served as main script editor for anything python-related.
Murray Richardson's script with 'MapTools' from the GEOG 3003 course was instrumental in creating the tutorial.
References
Harris, R. (2011). Statistics for Geography and Environmental Science: An Introduction to R.