Difference between revisions of "Calculating historical agricultural land differences using QGIS"

From CUOSGwiki
Jump to navigationJump to search
m (fixing a few typos and grammar errors)
 
(25 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  +
<table border=0 width=1000><tr><td>
 
==Introduction==
 
==Introduction==
   
One issue when working with large sized data sets is the massive resources required to render the files. This is not so bad when you have a computer powerful enough to handle it, or you absolutely need to have data that detailed, but it is when you do not. I found myself in that situation about a year ago. I was attempting to graphically represent agricultural land attrition in Southern Ontario, from 1983 – 2017.
+
One issue when working with large data sets is the massive resources required to render the files. This is not so bad when you have a computer powerful enough to handle them, or you absolutely need to have data that detailed, but there are problems in other situations. I found myself in such a situation about a year ago. I was attempting to graphically represent agricultural land attrition in Southern Ontario, from 1983 – 2017.
The data I had downloaded was incredibly detailed, perfect for if you wanted to grab a subset and analyze it on a micro level. Not so great when you are trying to analyze on a macro level. While I was able to complete my analysis, it was frustratingly time consuming. Every map movement caused a render lag, and any use of analysis tools resulted in a long wait.
+
The data I had downloaded were incredibly detailed, which is perfect if you wanted to grab a subset and analyze it on a micro level, but not so great if you want to analyze at a macro level. While I was able to complete my analysis, it was frustratingly time consuming. Every map movement caused a render lag, and any use of analysis tools resulted in a long wait.
This leads me to now. While I was not able to think up a way to do it back then, I figured now might be the time to try. This lead me to a feature in QGIS called ‘dissolve’. While it is also available in ArcPro, that is not the purpose of this assignment. I decided to see if I recreate my project, using QGIS instead of Arc, and perhaps improve upon it a little.
+
This leads me to now. While I was not able to think up a way to do it back then, I figured now might be the time to try. This lead me to a feature in QGIS called ‘dissolve’. While this is a general tool in many GIS software packages, I decided to see if I could improve my past project, using QGIS.
   
  +
==Cloud Computing==
  +
  +
Cloud computing is a fairly recent solution to having a large dataset that can take a long time to compute and also solve the issue of not having the correct resources in order to render the files. cloud computing can be useful if for some reason you can't use the dissolve tool mentioned above or if you do not have a particularly powerful computer, it can allow you to still do the work that you want to get done in a reasonable amount of time. there are several different ways to do this cloud computing and it depends on the type of software that you want to use however several large online companies such as amazon and google have services that if you're data is too large and you can not compute the layers you want to on the computer that you have then you can rent space on their servers to do this. this is a good way of getting your layers computed while still being able to do other work in the meantime.
   
 
==Data==
 
==Data==
   
Two files are needed for thhis project, the 1983 Agricultural Resource Inventory, and the most current Agricultural Resource Inventory. The source is constantly being updated, so I have provided a link to the homepage of each file, rather than a direct download link. Once on the page, select 'open', and the data will begin downloading.
+
Two files are needed for this project: the 1983 Agricultural Resource Inventory, and the most current Agricultural Resource Inventory. The source data are constantly being updated, so I have provided a link to the homepage of each file, rather than a direct download link. Once on the page, select 'open', and the data will begin downloading.
   
 
For the 1983 data, please visit: https://geohub.lio.gov.on.ca/datasets/agricultural-resource-inventory-1983
 
For the 1983 data, please visit: https://geohub.lio.gov.on.ca/datasets/agricultural-resource-inventory-1983
Line 22: Line 26:
   
   
In this image we can see the problem. The data is so dense that changes between the two cannot be seen. This data is really meant for querying, not so much for display.
+
In this image we can see the problem. The data are so dense that changes between the two cannot be seen. This data is really meant for querying, not so much for display.
   
 
[[File:problem.jpg|500px|thumb|center|Figure 1. Problem with data display]]
 
[[File:problem.jpg|500px|thumb|center|Figure 1. Problem with data display]]
Line 44: Line 48:
 
===Step 3: Check Data Validity===
 
===Step 3: Check Data Validity===
   
This is a very important step. Unlike ArcGIS, if there is any errors or invalid data, QGIS will quit whatever process is running.
+
This is a very important step. Unlike ArcGIS, if there are any errors or invalid data, QGIS will quit whatever process is running.
   
 
Now that we have deleted all unnecessary fields, we need to be sure that there are no errors in our shapefile geometry. To do this, we need to use the Check Validity tool.
 
Now that we have deleted all unnecessary fields, we need to be sure that there are no errors in our shapefile geometry. To do this, we need to use the Check Validity tool.
Line 56: Line 60:
 
[[File:chk_val.jpg|500px|thumb|center|Figure 6. Select the file, and leave all options as default]]
 
[[File:chk_val.jpg|500px|thumb|center|Figure 6. Select the file, and leave all options as default]]
   
[[File:save_val.jpg|500px|thumb|center|Figure 7. Validity Checker outputs]]
 
 
Once the validity checker has completed its operation, there should be three files. Valid output, Invalid output, and error. Remove the invalid and error, then right click on Valid_output and select Make Permanent…
 
Once the validity checker has completed its operation, there should be three files. Valid output, Invalid output, and error. Remove the invalid and error, then right click on Valid_output and select Make Permanent…
   
 
[[File:save_val.jpg|500px|thumb|center|Figure 7. Saving the Validity Checker outputs]]
   
[[File:New_files.JPG|500px|thumb|center|Figure 8. New layer outputs]]
 
 
When the new file has been saved, remove Valid_output, as well as the old ARI_2017 file, and open the newly created file from where you have saved it. You should wind up with something like this. You may not see the new file in the QGIS browser. If this happens, simply hit the refresh button. Be sure to duplicate this process for ARI_1983.
 
When the new file has been saved, remove Valid_output, as well as the old ARI_2017 file, and open the newly created file from where you have saved it. You should wind up with something like this. You may not see the new file in the QGIS browser. If this happens, simply hit the refresh button. Be sure to duplicate this process for ARI_1983.
   
 
[[File:New_files.JPG|500px|thumb|center|Figure 8. New layer outputs]]
===Step 4: Dissolve Polygons===
 
   
 
===Step 4: Dissolve Polygons===
The dissolve tool will allow us to merge all the individual polygons into one. This should make it much less resource intensive to render the data. It will also merge the area for each polygon into one attribute. This will make it easy to compare with the field calculator later on. Fair warning, this process for both files was somehow the longest, and most varied in terms of time. Each file will need anywhere from <5 minutes to >60 to process. I still haven't figured out why.
 
   
  +
The dissolve tool will allow us to merge all the individual polygons into one. This should make it much less resource intensive to render the data. It will also merge the area for each polygon into one attribute. This will make it easy to compare with the field calculator later on. Fair warning, this process for both files was somehow the longest, and most varied in terms of time. Each file will need anywhere from 15 minutes to more than five hours to process. This appears to be based on the number of attributes, as well as relative density of the data. With closer together 'dense' data being faster to process than more spread out 'less dense' data. think of this tool being like an addition tool for your polygons it will add all of the surface area into one so that when we move it around and manipulate it the computer doesn't have to do this with over 5000 polygons it just has to do this with one polygon which is much simpler. we can use this tool for this because we are using area as our variable some variables would need to be kept in their separate polygons, so make sure to check that your data is compatible before using this tool.
  +
 
[[File:dissolve_nav.jpg|500px|thumb|center|Figure 9. Navigate to the check validity tool]]
 
[[File:dissolve_nav.jpg|500px|thumb|center|Figure 9. Navigate to the check validity tool]]
 
Once completed, we can then finish the processing by dissolving the individual breaks, and creating a seamless surface.
 
Once completed, we can then finish the processing by dissolving the individual breaks, and creating a seamless surface.
   
[[File:dissolve_grif.jpg|500px|thumb|center|Figure 10. Settings for the dissolve tool]]
+
[[File:Dissolve_grif.jpg|500px|thumb|center|Figure 10. Settings for the dissolve tool]]
Open the dissolve tool, ensure ARI_2017_chk is selected, and run the tool. Save the new file as a shapefile, and ensure you select your area field in the 'Dissolve Field(s)' option. This process can take a while.
+
Open the dissolve tool, ensure ARI_2017_chk is selected, and run the tool. Save the new file as a shapefile, and leave all other options the same. This process can take a while.
   
 
===Step 5: Field Calculator===
 
===Step 5: Field Calculator===
   
Now that processing is complete, we can calculate the difference between the files. This will give us a number, that can then be converted into % change.
+
Now that processing is complete, we need to calculate the area of each file. This will give us a number, that can then be converted into percent change.
   
  +
When QGIS dissolves the ARI_2017_chk file, it keeps only one area entry. This means that area must be calculated from ARI_2017_chk, and manually added to the new dissolved layer (ARI_2017_dis).
 
To access the Field Calculator, open the attribute table, toggle on editing and navigate over to the abacus icon. It is one to the right of the delete field icon highlighted in figure 3.
 
To access the Field Calculator, open the attribute table, toggle on editing and navigate over to the abacus icon. It is one to the right of the delete field icon highlighted in figure 3.
   
 
[[File:field_calc_sum.jpg|500px|thumb|center|Figure 11. Field Calculator for individual area calculations]]
 
[[File:field_calc_sum.jpg|500px|thumb|center|Figure 11. Field Calculator for individual area calculations]]
   
First, we have to sum the 'area' field manually using <code>SUM(Shape_area)/10000</code>. This is because Shape_area is in meters squared, and we need to convert to hectares, which is what the 1983 data is in.
+
First, we have to sum the 'area' field manually using <code>SUM(Shape_area)/10000</code>. This is because Shape_area is in meters squared, and we need to convert to hectares, which is what the 1983 data is in. For calculating the total area in ARI_1983, the code will be <code>SUM(HECTARES)</code>.
  +
Once we have a single value for both fields, we can calculate the percent change between the two.
 
  +
Once we have a single value for both fields, navigate to the dissolve file, open the attribute table, enable the editor, and manually update the value for both dissolved files. Independently, of course.
  +
  +
  +
[[File:dis_fld_calc.jpg|500px|thumb|center|Figure 12. Update table value to include total are in hectares]]
  +
  +
Above is an example of the updated value in ARI_1983_dis. By double clicking on the attribute, we can update it manually with the correct value.
   
 
===Step 6: Difference===
 
===Step 6: Difference===
   
 
This allows us to generate a shapefile of the differences between two files, making it extremely easy to visually represent what has changed.
 
This allows us to generate a shapefile of the differences between two files, making it extremely easy to visually represent what has changed.
  +
  +
[[File:difference_nav.jpg|500px|thumb|center|Figure 13. Difference tool navigation]]
  +
  +
To access the difference tool, navigate over to Vector > Geoprocessing tools.
  +
  +
[[File:difference_grif.jpg|500px|thumb|center|Figure 14. Difference tool Settings]]
  +
  +
Set the larger layer as the input, and the smaller layer as the overlay. In this case, ARI_1983_dis is the input, and ARI_2017_dis is the overlay. Ensure the file is saved in a known location.
   
 
==Final Output==
 
==Final Output==
   
  +
Once all processing is completed, we can see the output. Dissolve worked much better for ARI_1983 (Orange), than it did for ARI_2017 (Black but supposed to be Green). When zooming in, borders can be clearly seen around the image which is probably because each piece of agricultural land in ARI_2017 was delineated much better, providing proper breaks between each field. This would also explain why there are many more attributes present in ARI_2017.
Once all processing is completed, we can see the output. Notice how it shows as a single colour, rather than many tiny polygons which show up as black.
 
  +
[[File: final.jpg|1000px]]
 
  +
[[File:final_output.jpg|500px|thumb|center|Figure 15. Final map output after all processing]]
   
  +
Using the field calculator (sort of) we are able to calculate the percent difference between 1983 and 2017. The difference is highlighted in red. 66.8% drop from 1983 to 2017.
== References ==
 
   
  +
[[File:percent.jpg|500px|thumb|center|Figure 16. Percent difference between 1983 & 2017]]
<ref> <sup>1</sup> The European Space Agency,[https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-2 "Sentinel-2"], Retrieved: 2019-11-05 </ref>
 
  +
  +
</td></tr></table>
  +
  +
==Conclusion==
  +
  +
When you are working with large datasets that have lots of polygons you can use the dissolve tool to make the data more manageable, however we need to be careful that we still keep the data accurate. If your computer is not powerful enough to do the calculations involved in this tool we also can use cloud computing to get this done in a more reasonable time. And in terms of the data we used in the example it turned out that using this method we were more easily able to tell that the amount of agricultural land declined by 66.8% from 1983 to 2017.
  +
 
== References ==

Latest revision as of 08:37, 20 October 2020

Introduction

One issue when working with large data sets is the massive resources required to render the files. This is not so bad when you have a computer powerful enough to handle them, or you absolutely need to have data that detailed, but there are problems in other situations. I found myself in such a situation about a year ago. I was attempting to graphically represent agricultural land attrition in Southern Ontario, from 1983 – 2017. The data I had downloaded were incredibly detailed, which is perfect if you wanted to grab a subset and analyze it on a micro level, but not so great if you want to analyze at a macro level. While I was able to complete my analysis, it was frustratingly time consuming. Every map movement caused a render lag, and any use of analysis tools resulted in a long wait. This leads me to now. While I was not able to think up a way to do it back then, I figured now might be the time to try. This lead me to a feature in QGIS called ‘dissolve’. While this is a general tool in many GIS software packages, I decided to see if I could improve my past project, using QGIS.

Cloud Computing

Cloud computing is a fairly recent solution to having a large dataset that can take a long time to compute and also solve the issue of not having the correct resources in order to render the files. cloud computing can be useful if for some reason you can't use the dissolve tool mentioned above or if you do not have a particularly powerful computer, it can allow you to still do the work that you want to get done in a reasonable amount of time. there are several different ways to do this cloud computing and it depends on the type of software that you want to use however several large online companies such as amazon and google have services that if you're data is too large and you can not compute the layers you want to on the computer that you have then you can rent space on their servers to do this. this is a good way of getting your layers computed while still being able to do other work in the meantime.

Data

Two files are needed for this project: the 1983 Agricultural Resource Inventory, and the most current Agricultural Resource Inventory. The source data are constantly being updated, so I have provided a link to the homepage of each file, rather than a direct download link. Once on the page, select 'open', and the data will begin downloading.

For the 1983 data, please visit: https://geohub.lio.gov.on.ca/datasets/agricultural-resource-inventory-1983

For the current data, please visit: https://geohub.lio.gov.on.ca/datasets/agricultural-resource-inventory-final

  • Due to issues surrounding QGIS' geodatabase handling ability, I have provided a link to the gdb data converted to a shp file format, below. For those with ArcGIS access, the original file in the link above can be opened in ArcMap and exported as a shapefile that will work with QGIS.

Tutorial Instructions

Step 1: Identifying the Problem

In this image we can see the problem. The data are so dense that changes between the two cannot be seen. This data is really meant for querying, not so much for display.

Figure 1. Problem with data display

Looking at this, it isn’t hard to see why we’re having the difficulties we are. There are 555,000 individual features in six fields, in the 2017 data, alone. We need to pare that down. All we want to keep is the ‘Shape_area’, which is in meters squared. This will allow us to calculate a final area.

Figure 2. Massive amounts of attribute data

Step 2: Deleting Fields

By deleting unnecessary fields, we can hopefully reduce the size and complexity of the data.

Figure 3. How to access editor for field deletion

Ensure that the editing tool is toggled on (highlighted in red), then open the new files attribute table, and delete the unnecessary fields.

Figure 4. Fields to be deleted

Everything except ‘Shape_Area’ should be removed. This can take a while. Once completed, save layer edits by selecting the icon on the right of the editing tool. Editing can be toggled off at this point as well.

Step 3: Check Data Validity

This is a very important step. Unlike ArcGIS, if there are any errors or invalid data, QGIS will quit whatever process is running.

Now that we have deleted all unnecessary fields, we need to be sure that there are no errors in our shapefile geometry. To do this, we need to use the Check Validity tool.


Figure 5. Navigate to the check validity tool


This is the first step. If there are errors in the data, or some of it is invalid, QGIS will not be able to process it. The ‘Check Validity’ tool will allow us to remove any bad data, and ensure that processing proceeds smoothly.

Figure 6. Select the file, and leave all options as default

Once the validity checker has completed its operation, there should be three files. Valid output, Invalid output, and error. Remove the invalid and error, then right click on Valid_output and select Make Permanent…

Figure 7. Saving the Validity Checker outputs

When the new file has been saved, remove Valid_output, as well as the old ARI_2017 file, and open the newly created file from where you have saved it. You should wind up with something like this. You may not see the new file in the QGIS browser. If this happens, simply hit the refresh button. Be sure to duplicate this process for ARI_1983.

Figure 8. New layer outputs

Step 4: Dissolve Polygons

The dissolve tool will allow us to merge all the individual polygons into one. This should make it much less resource intensive to render the data. It will also merge the area for each polygon into one attribute. This will make it easy to compare with the field calculator later on. Fair warning, this process for both files was somehow the longest, and most varied in terms of time. Each file will need anywhere from 15 minutes to more than five hours to process. This appears to be based on the number of attributes, as well as relative density of the data. With closer together 'dense' data being faster to process than more spread out 'less dense' data. think of this tool being like an addition tool for your polygons it will add all of the surface area into one so that when we move it around and manipulate it the computer doesn't have to do this with over 5000 polygons it just has to do this with one polygon which is much simpler. we can use this tool for this because we are using area as our variable some variables would need to be kept in their separate polygons, so make sure to check that your data is compatible before using this tool.

Figure 9. Navigate to the check validity tool

Once completed, we can then finish the processing by dissolving the individual breaks, and creating a seamless surface.

Figure 10. Settings for the dissolve tool

Open the dissolve tool, ensure ARI_2017_chk is selected, and run the tool. Save the new file as a shapefile, and leave all other options the same. This process can take a while.

Step 5: Field Calculator

Now that processing is complete, we need to calculate the area of each file. This will give us a number, that can then be converted into percent change.

When QGIS dissolves the ARI_2017_chk file, it keeps only one area entry. This means that area must be calculated from ARI_2017_chk, and manually added to the new dissolved layer (ARI_2017_dis). To access the Field Calculator, open the attribute table, toggle on editing and navigate over to the abacus icon. It is one to the right of the delete field icon highlighted in figure 3.

Figure 11. Field Calculator for individual area calculations

First, we have to sum the 'area' field manually using SUM(Shape_area)/10000. This is because Shape_area is in meters squared, and we need to convert to hectares, which is what the 1983 data is in. For calculating the total area in ARI_1983, the code will be SUM(HECTARES).

Once we have a single value for both fields, navigate to the dissolve file, open the attribute table, enable the editor, and manually update the value for both dissolved files. Independently, of course.


Figure 12. Update table value to include total are in hectares

Above is an example of the updated value in ARI_1983_dis. By double clicking on the attribute, we can update it manually with the correct value.

Step 6: Difference

This allows us to generate a shapefile of the differences between two files, making it extremely easy to visually represent what has changed.

Figure 13. Difference tool navigation

To access the difference tool, navigate over to Vector > Geoprocessing tools.

Figure 14. Difference tool Settings

Set the larger layer as the input, and the smaller layer as the overlay. In this case, ARI_1983_dis is the input, and ARI_2017_dis is the overlay. Ensure the file is saved in a known location.

Final Output

Once all processing is completed, we can see the output. Dissolve worked much better for ARI_1983 (Orange), than it did for ARI_2017 (Black but supposed to be Green). When zooming in, borders can be clearly seen around the image which is probably because each piece of agricultural land in ARI_2017 was delineated much better, providing proper breaks between each field. This would also explain why there are many more attributes present in ARI_2017.

Figure 15. Final map output after all processing

Using the field calculator (sort of) we are able to calculate the percent difference between 1983 and 2017. The difference is highlighted in red. 66.8% drop from 1983 to 2017.

Figure 16. Percent difference between 1983 & 2017

Conclusion

When you are working with large datasets that have lots of polygons you can use the dissolve tool to make the data more manageable, however we need to be careful that we still keep the data accurate. If your computer is not powerful enough to do the calculations involved in this tool we also can use cloud computing to get this done in a more reasonable time. And in terms of the data we used in the example it turned out that using this method we were more easily able to tell that the amount of agricultural land declined by 66.8% from 1983 to 2017.

References