Difference between revisions of "R Studio's Spatial Capabilities"

From CUOSGwiki
Jump to navigationJump to search
 
(50 intermediate revisions by one other user not shown)
Line 5: Line 5:
   
 
==Introduction==
 
==Introduction==
How to use R Studio:
+
===How to use R Studio:===
   
 
[[File:RStudioLayoutLabled.png|750px]]
 
[[File:RStudioLayoutLabled.png|750px]]
  +
'''Figure 1'''
   
'''Figure 1:''' The top left-hand box ('''1''') is the script editor. This is where one inputs all their commands; how to manipulate the data, perform analysis, and create graphical outputs. The bottom left-hand box ('''2''') is the console. This is where commands can be written, but these will not be saved if the GUI (Graphical User Interface) closes, and results are shown. When one runs a command in R Studio, the command along with its output is shown in this section, unless it is a plot. The bottom right-hand box ('''3''') is where graphical outputs are shown, and where help files, packages and data files can be found. The top right-hand box ('''4''') shows datasets that have been imported and keeps track of data and values that have been assigned in your R scripts. One can click on the datasets to view them in a table format in another window.
+
The top left-hand box ('''1''') is the script editor. This is where one inputs all their commands; how to manipulate the data, perform analysis, and create graphical outputs. The bottom left-hand box ('''2''') is the console. This is where commands can be written, but these will not be saved if the GUI (Graphical User Interface) closes, and results are shown. When one runs a command in R Studio, the command along with its output is shown in this section, unless it is a plot. The bottom right-hand box ('''3''') is where graphical outputs are shown, and where help files, packages and data files can be found. The top right-hand box ('''4''') shows datasets that have been imported and keeps track of data and values that have been assigned in your R scripts. One can click on the datasets to view them in a table format in another window.
   
In the command console one can simply type the command and press 'enter' and the command will go through right away. In the script editor one can type their command or as many as they wish, and then run it by selecting the line(s) they wish to run and click "run" - a button located in the upper right-hand corner of the script editor.
+
In the command console one can simply type the command and press 'Enter' and the command will go through right away. In the script editor one can type their command or as many as they wish, and then run it by selecting the line(s) they wish to run and clicking 'Run' - a button located in the upper right-hand corner of the script editor (As can be shown below in '''Figure 2''')
   
  +
[[File:RunButton.PNG|500px]]
Often one has large datasets that they wish to analyze. The datasets can be imported via text and .csv files either by selecting "Import Dataset" on the R Studio GUI or by entering a command line such as: insertnamefordata <- read.csv("/User/Documents/YOURDATA.csv"). Entry by command line requires one to know the exact path to the dataset.
 
  +
'''Figure 2'''
  +
  +
 
Often one has large datasets that they wish to analyze. The datasets can be imported via text and .csv files, either by selecting "Import Dataset" on the R Studio GUI or by entering a command line such as: insertnamefordata <- read.csv("/User/Documents/YOURDATA.csv"). Entry by command line requires one to know the exact path to the dataset.
   
 
[[File:RImport_Dataset.png|500px]]
 
[[File:RImport_Dataset.png|500px]]
  +
'''Figure 3'''
  +
 
[[File:RImport_Dataset2.png|500px]]
 
[[File:RImport_Dataset2.png|500px]]
  +
'''Figure 4'''
   
In the second image one can decide what to name the dataset they are importing and how to arrange it in the most suitable way. Generally one renames their data, adds headers, separates by comma, and leaves the rest as default.
+
As shown in '''Figure 4''', one can decide what to name the dataset there importing and how to arrange it in the most suitable way. Generally one renames their data, adds headers, separates by comma, and leaves the rest as default.
   
One should also set a "working directory" - this is the folder where all data can be located and where the .r file will be saved. This can be done by clicking "session" in the top tool bar of the R Studio GUI, selecting Set Working Directory and then "Choose Working Directory" and navigating to said folder.
+
One should also set a 'working directory'; this is a folder where all data can be located and where the '.R' file will be saved. This can be done by clicking 'Session' in the top tool bar of the R Studio GUI, selecting 'Set Working Directory' and then 'Choose Working Directory' and navigating to the correct folder, as shown in '''Figure 5'''.
   
  +
[[File:SetDirectory.png|500px]]
For data analysis, one can import a script that someone has already created, or import one that they were previously working on. This is done in the top left-hand corner with the open folder icon. Then one can rename the data according to how they imported their own. One would then have to go through the pre-made script and insert the proper name for the data accordingly.
 
  +
'''Figure 5'''
Another method is to write the script yourself depending on how much you know about the language and want to customize the process for a specific purpose.
 
  +
 
For data analysis, one can import a script that someone has already created, or import one that they were previously working on. This is done in the top left-hand corner with the open folder icon (See '''Figure 6'''). Then one can rename the data according to how they imported their own. One would then have to go through the pre-made script and insert the proper name for the data accordingly. Another method is to write the script yourself depending on how much you know about the language and want to customize the process for a specific purpose.
  +
  +
[[File:OpenFile.PNG|500px]]
  +
'''Figure 6'''
   
 
== Methods ==
 
== Methods ==
Line 30: Line 43:
   
 
[[File:2Commands.png|500px]]
 
[[File:2Commands.png|500px]]
  +
'''Figure 7'''
  +
 
[[File:3Commands.png|500px]]
 
[[File:3Commands.png|500px]]
  +
'''Figure 8'''
   
 
Commands in R Studio in order to achieve the same result:
 
Command in R Studio in order to achieve the same result
 
   
 
[[File:RStudio_Python_Commands.png|500px]]
 
[[File:RStudio_Python_Commands.png|500px]]
  +
'''Figure 9'''
   
 
A useful resource for translating between languages is this website: http://mathesaurus.sourceforge.net/r-numpy.html
 
A useful resource for translating between languages is this website: http://mathesaurus.sourceforge.net/r-numpy.html
   
=== Spatial R Packages===
+
===Spatial R Packages===
Packages can be installed through the GUI.
+
Packages can be installed through the GUI (See '''Figures 10-11''') or command line (See '''Figure 12''').
   
 
[[File:InstallPackage2.png|300px]]
 
[[File:InstallPackage2.png|300px]]
  +
'''Figure 10'''
  +
 
[[File:InstallPackage3.png|300px]]
 
[[File:InstallPackage3.png|300px]]
  +
'''Figure 11'''
   
  +
[[File:CommandLine.PNG|250px]]
Then in the script editor one must type "library(InsertPackageName)" in order to call on the package.
 
  +
'''Figure 12'''
   
 
Then in the script editor or command line one must type 'library(InsertPackageName)' in order to call on the package ('''Figure 13''').
R has packages maptools and sp that allow for manipulation of shapefiles. These packages allow shapefiles (lines, points, polygons and grids) to be read into R and displayed visually in the bottom right-hand corner of the GUI under "plots". The data used for this tutorial is Ottawa Census Data from 2006 that can be found on Carleton's Library website [http://www.library.carleton.ca/find/gis/geospatial-data/ottawa-gatineau-census-geography-files Ottawa Gatineau Census Data].
 
   
  +
[[File:Packages.png|250px]]
The following tutorial of Maptools has been modified from Murray Richardson's code and explanation on how to use the tool for spatial manipulation, from his GEOG 3003 course.
 
  +
'''Figure 13'''
  +
 
R has packages 'maptools' and 'sp' that allow for manipulation of shapefiles. These packages allow shapefiles (lines, points, polygons and grids) to be read into R and displayed visually in the bottom right-hand corner of the GUI under 'plots'. The data used for this tutorial is Ottawa Census Data from 2006 and 2016 that can be found on the Carleton Library website [http://www.library.carleton.ca/find/gis/geospatial-data/ottawa-gatineau-census-geography-files Ottawa Gatineau Census Data].
  +
 
The following tutorial of 'Maptools' uses a modified version of Murray Richardson's code and explanation on how to use the tool for spatial manipulation, from his GEOG 3003 course.
   
 
[[File:RStu_1.png|1000px]]
 
[[File:RStu_1.png|1000px]]
  +
'''Figure 14'''
   
 
[[File:RStu_2.png|1000px]]
 
[[File:RStu_2.png|1000px]]
  +
'''Figure 15'''
   
===Computing Temporal Change from 2006 to 2016===
+
===Assessing Temporal Change from 2006 to 2016===
  +
  +
After following the above directions to achieve the same result as in '''Figure 15''', we will now use another open source software; QGIS (Quantum GIS) in conjunction with R to evaluate the temporal change in Ottawa Wards from 2006-2016. The first step is to head to the [https://qgis.org/en/site/ QGIS] website and follow the provided instructions to get the proper version downloaded and setup. Next, using the link provided earlier download both the 2006 and 2016 census data from the Carleton Library site. Now, you should be able to add the 2006/2016 census shapefiles to QGIS, as shown in '''Figure 16'''.
  +
  +
[[File:2006_2016Qgis.PNG|500px]]
  +
'''Figure 16'''
  +
  +
Next you need to use the 'Merge Vector Layers' tool, to merge the 2006 and 2016 layers. This tool can be found in: 'Vector <- Data Management Tools <- Merge Vector Layers', as shown in '''Figure 17'''.
  +
  +
[[File:Merge.PNG|1000px]]
  +
'''Figure 17'''
  +
  +
Then you need select your 'inputs' (the files you want to merge), so in this case it would be the 2006 and 2016 census data (See '''Figure 18''').
  +
  +
[[File:SelectM.PNG|500px]]
  +
'''Figure 18'''
  +
  +
After the tool has been ran, you will end up with a file called 'Merged' (unless another name is specified). This file is a temporary layer, meaning if you close QGIS before you save it or make it a permanent layer it will be deleted. To do this, simply 'right click' on the file and select 'Make Permanent' (as demonstrated in '''Figure 19''') following the prompts on screen to choose a desired name/location for the new file.
  +
  +
[[File:Perm.png|500px]]
  +
'''Figure 19'''
  +
  +
From here you simply need to read the newly created shapefile into your R session (Using 'sp' as previously shown). For this map we will be utilizing a R package called 'RColorBrewer', which allows users to use premade color palletes to create visually pleasing plots. Install this package using either of the methods demonstrated [[#Spatial R Packages|earlier]]. The script with documentation can be seen below in '''Figure 20'''.
  +
  +
[[File:2016Script.PNG|700px]]
  +
'''Figure 20'''
  +
  +
The final product can be seen below in '''Figure 21'''.
  +
  +
[[File:FinalWithTitle.png|700]]
  +
'''Figure 21'''
   
 
== Results ==
 
== Results ==
Line 63: Line 120:
   
 
===Spatial R Packages ===
 
===Spatial R Packages ===
This tutorial showed a sample of how one can manipulate shapefiles in R without having to export to another software to see the visual representation. The main commands to remember are to concatenate the newly developed or adjusted data to the original shapefile so that it can be re-written
+
This tutorial showed a sample of how one can manipulate shapefiles in R without having to export to another software to see the visual representation. The main commands to remember are to concatenate the newly developed or adjusted data to the original shapefile so that it can be re-written.
  +
 
Example:
 
Example:
classes<-cut(OttawaArea$Shape_area, breaks=breakvals, labels=labs) <-- This is the new data
+
classes<-cut(OttawaArea$Shape_area, breaks=breakvals, labels=labs) <-- This is the new data slot
  +
slot(OttawaArea,"data")=newdata3 <-- This adds the new data into the main dataframe
 
writePolyShape(OttawaArea, "OttawaCTs2006_updated.shp") <-- This command re-writes the shapefile to include the latest updates
+
(OttawaArea,"data")=newdata3 <-- This adds the new data into the main dataframe
  +
  +
• writePolyShape(OttawaArea, "OttawaCTs2006_updated.shp") <-- This command re-writes the shapefile to include the latest updates
   
 
Another key feature in R Studio is that In the bottom right-hand quadrant where the plots are shown, one can export the images as a .JPEG or .PDF. This is very useful for reports.
 
Another key feature in R Studio is that In the bottom right-hand quadrant where the plots are shown, one can export the images as a .JPEG or .PDF. This is very useful for reports.
  +
  +
===Assessing Temporal Change from 2006 to 2016===
  +
This tutorial showed that using QGIS in conjunction with R and the 'RColorBrewer' package one can produce a map of the same quality as one created in 'ArcPro' or another paid software variant. Most people underestimate the spatial and cartographic abilities of both QGIS and R as they think that since they are open source software, they aren't as good as paid services. The reality couldn't be more wrong, we live in a time where more and more things are becoming open source, easier to access and capable of producing professional quality works. So, the next time you think you need to have a subscription or paid software to make a cool map, think again!
   
 
==References ==
 
==References ==
   
 
=== Acknowledgements===
 
=== Acknowledgements===
Dan Patterson served as main script editor for anything Python-related
+
Dan Patterson served as main script editor for anything python-related.
  +
Murray Richardson's script with MapTools was from the GEOG 3003 course. This script was instrumental in creating the tutorial.
+
Murray Richardson's script with 'MapTools' from the GEOG 3003 course was instrumental in creating the tutorial.
   
 
=== References ===
 
=== References ===
Harris, R. (2011). Statistics for Geography and Environmental Science: An Introduction to R. Http://www.social-statistics.org
+
Harris, R. (2011). Statistics for Geography and Environmental Science: An Introduction to R.
  +
  +
[https://www.socscistatistics.com Social Statistics]
   
 
[http://www.library.carleton.ca/find/gis/geospatial-data/ottawa-gatineau-census-geography-files Ottawa Gatineau Census Data].
 
[http://www.library.carleton.ca/find/gis/geospatial-data/ottawa-gatineau-census-geography-files Ottawa Gatineau Census Data].

Latest revision as of 19:40, 28 October 2019

Purpose

This tutorial prominently centers on the abilities of R Studio; an open source software program utilizing the programming language 'R' which is often used for data analysis, statistics and the graphing of very large data sets. Users may want to view spatial output of their data and R has a package (sp) that allows for the manipulation of spatial data. If users are not familiar with the package, then they may want to import the data to more visual software such as QGIS. QGIS uses the scripting language Python. In order for R and QGIS to communicate the scripting language must be translated.

This tutorial shows how one could manually translate Python to R language - this required knowledge of how both programs operate, and how to use spatial packages in R Studio.

Introduction

How to use R Studio:

RStudioLayoutLabled.png Figure 1

The top left-hand box (1) is the script editor. This is where one inputs all their commands; how to manipulate the data, perform analysis, and create graphical outputs. The bottom left-hand box (2) is the console. This is where commands can be written, but these will not be saved if the GUI (Graphical User Interface) closes, and results are shown. When one runs a command in R Studio, the command along with its output is shown in this section, unless it is a plot. The bottom right-hand box (3) is where graphical outputs are shown, and where help files, packages and data files can be found. The top right-hand box (4) shows datasets that have been imported and keeps track of data and values that have been assigned in your R scripts. One can click on the datasets to view them in a table format in another window.

In the command console one can simply type the command and press 'Enter' and the command will go through right away. In the script editor one can type their command or as many as they wish, and then run it by selecting the line(s) they wish to run and clicking 'Run' - a button located in the upper right-hand corner of the script editor (As can be shown below in Figure 2)

RunButton.PNG Figure 2


Often one has large datasets that they wish to analyze. The datasets can be imported via text and .csv files, either by selecting "Import Dataset" on the R Studio GUI or by entering a command line such as: insertnamefordata <- read.csv("/User/Documents/YOURDATA.csv"). Entry by command line requires one to know the exact path to the dataset.

RImport Dataset.png Figure 3

RImport Dataset2.png Figure 4

As shown in Figure 4, one can decide what to name the dataset there importing and how to arrange it in the most suitable way. Generally one renames their data, adds headers, separates by comma, and leaves the rest as default.

One should also set a 'working directory'; this is a folder where all data can be located and where the '.R' file will be saved. This can be done by clicking 'Session' in the top tool bar of the R Studio GUI, selecting 'Set Working Directory' and then 'Choose Working Directory' and navigating to the correct folder, as shown in Figure 5.

SetDirectory.png Figure 5

For data analysis, one can import a script that someone has already created, or import one that they were previously working on. This is done in the top left-hand corner with the open folder icon (See Figure 6). Then one can rename the data according to how they imported their own. One would then have to go through the pre-made script and insert the proper name for the data accordingly. Another method is to write the script yourself depending on how much you know about the language and want to customize the process for a specific purpose.

OpenFile.PNG Figure 6

Methods

A Comparison

Commands in Python

2Commands.png Figure 7

3Commands.png Figure 8

Commands in R Studio in order to achieve the same result:

RStudio Python Commands.png Figure 9

A useful resource for translating between languages is this website: http://mathesaurus.sourceforge.net/r-numpy.html

Spatial R Packages

Packages can be installed through the GUI (See Figures 10-11) or command line (See Figure 12).

InstallPackage2.png Figure 10

InstallPackage3.png Figure 11

CommandLine.PNG Figure 12

Then in the script editor or command line one must type 'library(InsertPackageName)' in order to call on the package (Figure 13).

Packages.png Figure 13

R has packages 'maptools' and 'sp' that allow for manipulation of shapefiles. These packages allow shapefiles (lines, points, polygons and grids) to be read into R and displayed visually in the bottom right-hand corner of the GUI under 'plots'. The data used for this tutorial is Ottawa Census Data from 2006 and 2016 that can be found on the Carleton Library website Ottawa Gatineau Census Data.

The following tutorial of 'Maptools' uses a modified version of Murray Richardson's code and explanation on how to use the tool for spatial manipulation, from his GEOG 3003 course.

RStu 1.png Figure 14

RStu 2.png Figure 15

Assessing Temporal Change from 2006 to 2016

After following the above directions to achieve the same result as in Figure 15, we will now use another open source software; QGIS (Quantum GIS) in conjunction with R to evaluate the temporal change in Ottawa Wards from 2006-2016. The first step is to head to the QGIS website and follow the provided instructions to get the proper version downloaded and setup. Next, using the link provided earlier download both the 2006 and 2016 census data from the Carleton Library site. Now, you should be able to add the 2006/2016 census shapefiles to QGIS, as shown in Figure 16.

2006 2016Qgis.PNG Figure 16

Next you need to use the 'Merge Vector Layers' tool, to merge the 2006 and 2016 layers. This tool can be found in: 'Vector <- Data Management Tools <- Merge Vector Layers', as shown in Figure 17.

Merge.PNG Figure 17

Then you need select your 'inputs' (the files you want to merge), so in this case it would be the 2006 and 2016 census data (See Figure 18).

SelectM.PNG Figure 18

After the tool has been ran, you will end up with a file called 'Merged' (unless another name is specified). This file is a temporary layer, meaning if you close QGIS before you save it or make it a permanent layer it will be deleted. To do this, simply 'right click' on the file and select 'Make Permanent' (as demonstrated in Figure 19) following the prompts on screen to choose a desired name/location for the new file.

Perm.png Figure 19

From here you simply need to read the newly created shapefile into your R session (Using 'sp' as previously shown). For this map we will be utilizing a R package called 'RColorBrewer', which allows users to use premade color palletes to create visually pleasing plots. Install this package using either of the methods demonstrated earlier. The script with documentation can be seen below in Figure 20.

2016Script.PNG Figure 20

The final product can be seen below in Figure 21.

700 Figure 21

Results

A Comparison

Exact translation between the two languages is almost possible. The biggest issue found is that R did not arrange the XY points as coordinates like they were meant to be. However the results calculated were the same in the end despite R being unable to pair them. When the standard deviation and variance were calculated for the X and Y columns the values were different because in Python there is 1 degree of freedom accounted for. There must be a function in R that allows one to designate the desired degrees of freedom, so finding a solution should not be too difficult.

Spatial R Packages

This tutorial showed a sample of how one can manipulate shapefiles in R without having to export to another software to see the visual representation. The main commands to remember are to concatenate the newly developed or adjusted data to the original shapefile so that it can be re-written.

Example:

• classes<-cut(OttawaArea$Shape_area, breaks=breakvals, labels=labs) <-- This is the new data slot
• (OttawaArea,"data")=newdata3 <-- This adds the new data into the main dataframe
• writePolyShape(OttawaArea, "OttawaCTs2006_updated.shp") <-- This command re-writes the shapefile to include the latest updates

Another key feature in R Studio is that In the bottom right-hand quadrant where the plots are shown, one can export the images as a .JPEG or .PDF. This is very useful for reports.

Assessing Temporal Change from 2006 to 2016

This tutorial showed that using QGIS in conjunction with R and the 'RColorBrewer' package one can produce a map of the same quality as one created in 'ArcPro' or another paid software variant. Most people underestimate the spatial and cartographic abilities of both QGIS and R as they think that since they are open source software, they aren't as good as paid services. The reality couldn't be more wrong, we live in a time where more and more things are becoming open source, easier to access and capable of producing professional quality works. So, the next time you think you need to have a subscription or paid software to make a cool map, think again!

References

Acknowledgements

Dan Patterson served as main script editor for anything python-related.

Murray Richardson's script with 'MapTools' from the GEOG 3003 course was instrumental in creating the tutorial.

References

Harris, R. (2011). Statistics for Geography and Environmental Science: An Introduction to R.

Social Statistics

Ottawa Gatineau Census Data.

Translating R and Python