Exploring Clustering In QGIS

From CUOSGwiki
Revision as of 12:00, 8 December 2021 by Joshgoutte (talk | contribs)
Jump to navigationJump to search

Purpose

The purpose of this project is to explore the capabilities of open source software such as QGIS. QGIS is one of the many open source softwares we have used in GEOM 4008. In this tutorial which I am writing in 2021, I will be going into vector analysis tools in QGIS which are used to create clusters. I will go step by step going through the Density Based and K-means methods of clustering and I will give the results of my findings.

Introduction

The use of clusters is a practical tool in GIS, it can help group vector data points into separate clusters or groups which the amount is fully configurable by the user. This can be helpful if you have a vast area and you want to divide a task into multiple areas and divide and conquer to accomplish a certain goal. The option is given in ArcPro and while it may have a bigger collection of clustering options, QGIS still has clustering capabilities which are DBSCAN(density based) and K-Means. We will be using the newest version of QGIS at the time of tutorial, version 3.22. The most important part for the start of the project is downloading the latest version, getting the proper data and proper projections which I will go step by step in detailing below. The scope of my project will be going as a City of Ottawa official that wants to investigate incidents(vehicle, bicycle, etc...) and he feels the city is too big and the goal of this tutorial will be on how we can designate multiple smaller areas for him to work with.

Data

The data that we will be using will be in two forms. Referencing data so we can visualize where we are and data we will use for the clustering.

Referencing

  • The Roads dataset will be used to visualize the roads, this is important to add as those incidents and vector data points are on roads so it can be a helpful tool to illustrate the incidents in a smaller scale.
  • The Wards dataset is a wards dataset which will be used to see where those incidents are with an Ottawa city border background. All data points will be located within that dataset.

Clustering

  • The Traffic Collisions by Location in 2013 dataset is the data set with all the traffic collisions and their frequencies. Their location information will help us separate them into clusters. Once you have added this dataset, I recommend a change as name as it will automatically save as a complicated alphanumeric sequence.

Acquiring QGIS (version 3.16 or 3.22)

To run this analysis, it is best to have the latest version of QGIS. I will detail the steps into getting the newest version.

  • If you have an old version, I would start by deleting that one before proceeding to further steps as having two versions of QGIS is not necessary and takes space on the hard drive.
  • Click on this link and you will see many options on operating systems you can download QGIS on
  • Select either the fastest(3.22) or the more stable(3.16) download depending on your preference and it should start downloading
  • Once it's done downloading add it to your hard drive(if not already done automatically) and you should be ready to use the open source software

Set up the Environment

Add vector data

We will go into how to add the data, we have discussed under the Data section of this article into QGIS.

  • Download the data we listed earlier by clicking on the cloud with a down facing arrow and select the Shapefile option
  • Go on QGIS and go to the "Layer" menu, scroll over "Add Layer" and then select "Add Vector Layer"
  • Under Source, click on the 3 dots next to the text box and then go to the location of the dowloaded datasets, do a control click on the 3 datasets mentioned and you will be able to add them all at once.

Projection

Now it's time to set the projection, to do so:

  • Go on the "Project" menu and select "Properties"
  • Select "CRS" and search "WGS84/ UTM Zone 18N"

For the individual vector layers:

  • Right click the layer and select "Properties"
  • Select "Source" and select the same as the project projection.

Projecting to a projection such as this one will be important considering we are dealing with distances when using the DBSCAN clustering.

Symbology

We use symbology as a method of better reference

Rule based classification

We decide to do a rule based classification to separate the highways from the main roads. To do that:

  • Right click the roads shapefile and click on "Properties"
  • Click on Symbology and in the drop down menu at the top select "Rule-Based"
  • Under the only rule available, double click it, check the Filter box and click the purple dots next to the box
  • Enter ""SUBCLASS"='Highway'" in the Expression String Builder and then click OK
  • Then in the Label text box, label it "Highway"
  • On that same page go down and select a symbol different to the roads you currently have.
  • Click OK
  • Click on the Green + at the bottom to add a new rule, double click, label it Roads and check Else as it's the other condition.

Applications

Applications to complete the tasks of dividing up the regions for the investigations on the happenings of collisions.

Clustering

We will go through the methods of how to use the two forms of clustering on QGIS

DBSCAN clustering