Difference between revisions of "Social Spatial Network Tools in R"

Revision as of 18:34, 19 December 2023

Introduction to Social Spatial Networks (SSNs):

Researchers in the social sciences have used social networks/sociograms to visualize the connections and relationships of people in a community since the 1930s (Andris & Sarkar, 2022). However, these networks are aspatial and do not integrate geospatial information about individuals to analyze and explain these relationships (and lack of relationships). Based on Tobler's First Law (Tobler, 1970) "everything is related to everything else, but near things are more related than distant things", it could be theorized that people who live/work near one another are more likely to have similar characteristics and to interact more frequently, but this cannot be confirmed using sociograms alone as they lack a spatial component. Networking and graph theory from the field of Computer Science is also tangentially related to social-spatial networking, but is generally focused on the abstract or theoretical connections between nodes, rather than simulations of real world phenomena (Bondy, 1982). Finally, network analysis also exists within traditional GIS fields and discussions, though this generally based on the distribution of goods and services along pre-defined road/stream networks, rather than a focus on the relationship/connection outside of pre-defined networks (Andris & Sarkar, 2022). Social-Spatial Networks are an integration of the ideas found in these different fields to analyze and document social relationships/connections between individuals situated within their geospatial locations to better understand how connections are formed and maintained.

Term Definitions

In Social-Spatial Network analysis, Nodes are specific geolocated points representing people, businesses, or other points of interest. All nodes need at minimum two datapoints, a unique name/ID to reference the node and some form of location information that can be converted into a (lat, long) pair. Nodes may also contain auxiliary information about the point being represented, such as demographic information on a participant, or classifications of a business type.

Edges represent a social connection between two nodes and are at minimum composed of a pair of names/IDs that are found in the node list, all names within the edge list must be found exactly in the node list, or the program may crash. Similarly to nodes, edges can also contain additional information about a connection, such as a strength value or a type classification that may be used to weight the algorithms.

Buidling a Dataset for a Social-Spatial Network:

For the purposes of this tutorial we will use a mock dataset created from the personal knowledge of this tutorial writer about the characters of the audio-fiction podcast, The Magnus Archives, produced by Rusty Quill under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Non-specific location information for the nodes was taken from the Magnus Archives Wikia as well as a fan-made map (found here) for some that were unclear, alongside my own knowledge of the plot/characters when ambiguous/multiple locations. The dataset is available to download from the writer’s Github account https://github.com/otter-lights/SSN-Dataset_TheMagnusArchives and is composed of four files, the initial nodes.csv file and the accompanying location.csv file, which are used to create a final geolocated nodes.csv file, as well as an edges.csv file.

Define Your Nodes

Figure 1. Screenshot of Initial Node File

The first step to creating a social spatial networking dataset is to define the sample that will make up your nodes. Depending on the project this may be a previously defined set of participants, but often the scope of the node definition needs to be determined. Since the Magnus Archives dataset was made for this tutorial rather than actual research characters were selected to be included without considerable thought or planning, but in real-life scenarios a more systematic approach should generally be used.

Initial data creation was done in a spreadsheet software, chosen characters were listed alongside their in-universe affiliation and a written description of their main location. Character names are going to be used as the node identifier so they must be unique within the dataset, affiliations and locations can have duplication.

Geolocation of Written Descriptions

The next step to create the node list is to turn the written location descriptions identified previously and geolocate them to a latitude and longitude pair. First this required a list of all unique location descriptions in the original file, for a smaller datasets this can be done manually; however, for larger datasets or datasets with a lot of duplication, scraping of unique descriptions can be done in R. Code for this is shown in Figure 2. below.

Figure 2. Screenshot of R code used to generate unique location file

After the unique descriptions have been saved into a 'unique_locs.csv' file, the "real_loc" column can be filled out with more specific real world locations that are determined after the fact, such as street addresses or coordinate points. This column will be used as the input for geolocation using OpenStreetMaps. Coordinate pairs being used as real locations should be in decimal degrees as single column with a comma separator and no quotes or other formatting. This .csv file can then be reimported into your R script.

The final step to creating the node list for SSN analysis is to geolocate each of these real-world locations and match them to their corresponding nodes. Geolocation is done using the "tmaptools" library, specifically the "geocode_OSM(query, return.first.only = TRUE, details = FALSE, as.data.frame = TRUE)" function. This line of code will take the query provided, in this case an iteration of through each row of real_locs, and query the OpenStreetMap Nominatim server, returning only the first result as a dataframe. Details about the type of OSM feature can be included in the query results if details = TRUE in the function parameters, this can be used for verification and troubleshooting for weirdly placed points, but is not included in this workflow. The lat,lon pairs from these query results are saved alongside the unique locations and then matched to the original nodes using another for loop, and the final node list can be saved to a .csv file, that should now include (at minimum) columns for a unique identifier, lat, and long.

Edges ~ Relationship Definition

The last element of dataset creation for social-spatial networking is to identify and define the relationships between the nodes/participants. This can be done in a number of ways depending on the density of the network and the number of edges needed to be recorded. For this use-case, all possible edges between the given characters were created using "all_combinations <- data.frame(t(combn(nodes$name,2)))". This very long dataframe was then trimmed based on personal knowledge of the show to remove any illogical or insignificant character connections, trimming the size of the dataframe from 406 (29 choose 2) to 98 valid relationships. Each of these relationships were then coded as one of five different types; romantic(r), platonic(p), familial(f), work partners (w), enemies(e).

SNoMaN Web App

Import Data

Spatial & Aspatial Exploration

Labeling + Appearance + Filtering

Network Statistics + Algorithims

SSNtools - R Library

The initial intention of this tutorial was to provide an overview of Social Spatial Networking Tools in using the R programming language and the R packages "SSNtools", "tmap" and "igraph". However, the tutorial provided by the creators of the R package is very thorough, particularly in the area of the advanced statistical tools, if this is the type of analysis you would like to complete you can find that tutorial here. Rather than repeating this work, I will instead focus on clarifying for a begineer audience the visualization capabilities and instructions from tutorial Chapters 1-4, that are not available in the SNoMaN web app as the tutorial is written for a more advanced audience.

References

Andris, C., & Sarkar, D. (2022). Social networks in space. Chapters, 400-415.

Bondy, J. A. (1982). Graph theory with applications.

Tobler, Waldo R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography (Supplement: Proceedings, International Geographical Union. Commission on Quantitative Methods), 46: 234–240. DOI:10.2307/143141.

@@ Line 19: / Line 19: @@
 <h3>Geolocation of Written Descriptions</h3>
-The next step to create the node list is to turn the written location descriptions identified previously and geolocate them to a latitude and longitude pair. First this required a list of all unique location descriptions in the original file, for a smaller datasets this can be done manually; however, for larger datasets or datasets with a lot of duplication, scraping of unique descriptions can be done in R.
+The next step to create the node list is to turn the written location descriptions identified previously and geolocate them to a latitude and longitude pair. First this required a list of all unique location descriptions in the original file, for a smaller datasets this can be done manually; however, for larger datasets or datasets with a lot of duplication, scraping of unique descriptions can be done in R. Code for this is shown in Figure 2. below.
-[[File:R Code Unique Descriptions.png|1000px|frame|center|Figure 2. Screenshot of R code used to generate unique location file]]
+[[File:R Code Unique Descriptions.png|900px|thumb|center|Figure 2. Screenshot of R code used to generate unique location file]]
 After the unique descriptions have been saved into a 'unique_locs.csv' file, the "real_loc" column can be filled out with more specific real world locations that are determined after the fact, such as street addresses or coordinate points. This column will be used as the input for geolocation using OpenStreetMaps. Coordinate pairs being used as real locations should be in decimal degrees as single column with a comma separator and no quotes or other formatting. This .csv file can then be reimported into your R script.

Difference between revisions of "Social Spatial Network Tools in R"

Revision as of 18:34, 19 December 2023

Contents