Exploring Clustering In QGIS

From CUOSGwiki
Revision as of 10:47, 8 December 2021 by Joshgoutte (talk | contribs)
Jump to navigationJump to search

Purpose

The purpose of this project is to explore the capabilities of open source software such as QGIS. QGIS is one of the many open source softwares we have used in GEOM 4008. In this tutorial which I am writing in 2021, I will be going into vector analysis tools in QGIS which are used to create clusters. I will go step by step going through the Density Based and K-means methods of clustering and I will give the results of my findings.

Introduction

The use of clusters is a practical tool in GIS, it can help group vector data points into separate clusters or groups which the amount is fully configurable by the user. This can be helpful if you have a vast area and you want to divide a task into multiple areas and divide and conquer to accomplish a certain goal. The option is given in ArcPro and while it may have a bigger collection of clustering options, QGIS still has clustering capabilities which are DBSCAN(density based) and K-Means. We will be using the newest version of QGIS at the time of tutorial, version 3.22. The most important part for the start of the project is downloading the latest version, getting the proper data and proper projections which I will go step by step in detailing below. The scope of my project will be going as a City of Ottawa official that wants to investigate incidents(vehicle, bicycle, etc...) and he feels the city is too big and the goal of this tutorial will be on how we can designate multiple smaller areas for him to work with.

Data

The data that we will be using will be in two forms. Referencing data so we can visualize where we are and data we will use for the clustering.

Referencing

  • The Roads dataset will be used to visualize the roads, this is important to add as those incidents and vector data points are on roads so it can be a helpful tool to illustrate the incidents in a smaller scale.
  • The Wards dataset is a wards dataset which will be used to see where those incidents are with an Ottawa city border background. All data points will be located within that dataset.

Clustering

  • The Traffic Collisions by Location in 2013 dataset is the data set with all the traffic collisions and their frequencies. Their location information will help us separate them into clusters. Once you have added this dataset, I recommend a change as name as it will automatically save as a complicated alphanumeric sequence.

Acquiring QGIS (version 3.20.3)

To run this analysis, it is best to have the latest version of QGIS. I will detail the steps into getting the newest version.

  • If you have an old version, I would start by deleting that one before proceeding to further steps as having two versions of QGIS is not necessary and takes space on the hard drive.
  • Click on this link and you will see many options on operating systems you can download QGIS on
  • Select either the fastest or the more stable download depending on your preference and it should start downloading
  • Once it's done downloading add it to your hard drive(if not already done automatically) and you should be ready to use the open source software

Set up the Environment

Add vector data

We will go into how to add the data, we have discussed under the Data section of this article into QGIS.

  • Go to the links provided and click the [[File:CloudArrow.PNG]|10px]

Projection

Symbology