Difference between revisions of "Exploring Clustering In QGIS"

From CUOSGwiki
Jump to navigationJump to search
Line 34: Line 34:
 
= Set up the Environment=
 
= Set up the Environment=
   
===Add vector data===
+
==Add vector data==
   
 
We will go into how to add the data, we have discussed under the Data section of this article into QGIS.
 
We will go into how to add the data, we have discussed under the Data section of this article into QGIS.
Line 44: Line 44:
 
*Under Source, click on the 3 dots next to the text box and then go to the location of the dowloaded datasets, do a control click on the 3 datasets mentioned and you will be able to add them all at once.
 
*Under Source, click on the 3 dots next to the text box and then go to the location of the dowloaded datasets, do a control click on the 3 datasets mentioned and you will be able to add them all at once.
   
===Projection===
+
==Projection==
   
 
Now it's time to set the projection, to do so:
 
Now it's time to set the projection, to do so:
Line 60: Line 60:
 
Projecting to a projection such as this one will be important considering we are dealing with distances when using the DBSCAN clustering.
 
Projecting to a projection such as this one will be important considering we are dealing with distances when using the DBSCAN clustering.
   
===Symbology===
+
==Symbology==

Revision as of 11:05, 8 December 2021

Purpose

The purpose of this project is to explore the capabilities of open source software such as QGIS. QGIS is one of the many open source softwares we have used in GEOM 4008. In this tutorial which I am writing in 2021, I will be going into vector analysis tools in QGIS which are used to create clusters. I will go step by step going through the Density Based and K-means methods of clustering and I will give the results of my findings.

Introduction

The use of clusters is a practical tool in GIS, it can help group vector data points into separate clusters or groups which the amount is fully configurable by the user. This can be helpful if you have a vast area and you want to divide a task into multiple areas and divide and conquer to accomplish a certain goal. The option is given in ArcPro and while it may have a bigger collection of clustering options, QGIS still has clustering capabilities which are DBSCAN(density based) and K-Means. We will be using the newest version of QGIS at the time of tutorial, version 3.22. The most important part for the start of the project is downloading the latest version, getting the proper data and proper projections which I will go step by step in detailing below. The scope of my project will be going as a City of Ottawa official that wants to investigate incidents(vehicle, bicycle, etc...) and he feels the city is too big and the goal of this tutorial will be on how we can designate multiple smaller areas for him to work with.

Data

The data that we will be using will be in two forms. Referencing data so we can visualize where we are and data we will use for the clustering.

Referencing

  • The Roads dataset will be used to visualize the roads, this is important to add as those incidents and vector data points are on roads so it can be a helpful tool to illustrate the incidents in a smaller scale.
  • The Wards dataset is a wards dataset which will be used to see where those incidents are with an Ottawa city border background. All data points will be located within that dataset.

Clustering

  • The Traffic Collisions by Location in 2013 dataset is the data set with all the traffic collisions and their frequencies. Their location information will help us separate them into clusters. Once you have added this dataset, I recommend a change as name as it will automatically save as a complicated alphanumeric sequence.

Acquiring QGIS (version 3.16 or 3.22)

To run this analysis, it is best to have the latest version of QGIS. I will detail the steps into getting the newest version.

  • If you have an old version, I would start by deleting that one before proceeding to further steps as having two versions of QGIS is not necessary and takes space on the hard drive.
  • Click on this link and you will see many options on operating systems you can download QGIS on
  • Select either the fastest(3.22) or the more stable(3.16) download depending on your preference and it should start downloading
  • Once it's done downloading add it to your hard drive(if not already done automatically) and you should be ready to use the open source software

Set up the Environment

Add vector data

We will go into how to add the data, we have discussed under the Data section of this article into QGIS.

  • Download the data we listed earlier by clicking on the cloud with a down facing arrow and select the Shapefile option
  • Go on QGIS and go to the "Layer" menu, scroll over "Add Layer" and then select "Add Vector Layer"
  • Under Source, click on the 3 dots next to the text box and then go to the location of the dowloaded datasets, do a control click on the 3 datasets mentioned and you will be able to add them all at once.

Projection

Now it's time to set the projection, to do so:

  • Go on the "Project" menu and select "Properties"
  • Select "CRS" and search "WGS84/ UTM Zone 18N"

For the individual vector layers:

  • Right click the layer and select "Properties"
  • Select "Source" and select the same as the project projection.

Projecting to a projection such as this one will be important considering we are dealing with distances when using the DBSCAN clustering.

Symbology