Thursday, April 13, 2017

Network Analysis

Goal and Objectives:

The primary goal of this assignment is to perform network analysis to find the closest route from the sand mines to rail stations for transportation of the sand. The purpose was to use python to query out mines that are active and do not have a rail station on site.  The mines also should not be within 1.5km of a rail line because the closer ones likely have rail spurs connecting them.  Another goal of this assignment is to gain experience using model builder to use the closest facility solver and build a model to calculate the closest facility route.  The final goal is to create an equation to calculate a hypothetical cost of sand truck travel on roads by county.

A White Paper from the National Center for Freight and Infrastructure Research and Education (2013) provides some important context for this study.  It notes that a common concern among communities where sand mines are located is road damage.  This is due to big and heavy vehicles frequently traveling along local roads as a part of the operations of the mine. In a table presented in the article, it notes road damage as a significant impact in all types of sand mining operations.  It explains a case study done in Chippewa County, WI, where sand mining impacts on roads is a major local concern. It describes how different types of roads (state, local, etc.) carry different regulations. In many cases, negotiations occur in order to determine appropriate reimbursement terms between the mine operators and the counties involved. 

The mine data set used in this study is the official, most recent set of mines from the DNR.  The street map used as the source for the Network Dataset came from ESRI.  The rail terminal locations were provided in the geodatabase for this lab, but they likely came from the Department of Transportation. In this study, hypothetical numbers are used when examining the number of trips and cost for each county.


Methods:

The first portion of this lab involved using python scripting to extract the mines where trucking routes will need to be examined.  This includes mines that are active, do not have a rail loading station on-site, and are not within 1.5km of a rail station.  The purpose of this was to determine the starting points for the route planning in the cases where trucks would need to be used to transport sand.  The scrip used to execute this process can be found in the Python Scripts blog post below.

To execute the rest of this project, model builder was used (see Figure 1).  The workflow will be explained below.


Figure 1: Model used to execute this project.
The first step in this network analysis is to create a closest facility layer.  This is done by making the streets the input layer.  The add locations tool is then used to load the final mines that were extracted by the python code as the incidents, and then run again to add the rail terminals as the facilities.  The solve tool is added to actually run the network analysis to determine the best routes.  In order to export the resulting routes as a feature class, the select data tool is used.  In this tool, the child data element used is Routes.  The copy features tool is then used in order to save the selected features in the geodatabase. 

The next portion of this model is intended to calculate the total road length for the routes by county.  First, in order to clean up the data frame a little bit, the counties, rail terminals, and roads layers are all clipped to Wisconsin and Minnesota.  Minnesota is included because some of the routes found it most efficient to take the sand to a terminal in Minnesota.  The Identity tool is then used in order to connect the county names with route information.  In order for proper analysis to be conducted, the layers are projected.  This ensures that the distortion would be minimized in the final map (see Figure 2), as well as allows for proper measurements of the route lengths.  To find out the total distance of roads in each county, the Summary Statistics tool is used.  This provides a summary of the distances in feet, however it is more relevant to see the distances in miles so the Add Field and Calculate Field tools are used to display the distance in miles.  This involves an equation that divides the distance value by 5280.  To establish the estimated cost for each county, another field is added to the table and an the following equation is used to determine the hypothetical cost: 
Cost$=(((miles)*2.2)/100)(trips).  In the case of this study, it is assumed that 50 trips are taken each year to the mines as well as 50 trips back. The resulting table can be seen below (see Figure 3).

Results and Discussion:

The following map shows the most efficient truck routes from the relevant mines to the nearest rail facility.

Figure 2: Map of routes
This shows that in many cases there are multiple mines going to the same rail terminal.  This can have a huge impact on the roads along that route due to the overlap as well as the number of trips being taken from each mine.  It also shows that Minnesota is impacted as well as Wisconsin because, in some cases, the closest rail depot is actually in Minnesota.  This can create several more complicated issues because the bulk of the business is occurring in Wisconsin, while it ends up costing Minnesota as well in terms of road repair needs.

Figure 3: Cost per county

This table lists the costs (in USD) of the mining industry on each county the routes travel through.  As one can see, there are many instances where a route travels through a county where there is no actual mining activity.  In these cases, the counties are impacted and may not receive proper compensation for their roads. 

Conclusions:

It is very evident that mining takes a toll on the communities surrounding the operations. One major concern is the impact it has on road conditions in the cases where sand must be transported to rail stations by heavy trucks.  This study examined the most efficient routes to rail stations from mines that are not near a rail station.  In many cases, one mine alone has a huge impact of multiple counties and even multiple states.  When this is considered in the grand scheme of things, the sand mining industry ends up having a much larger impact than is initially noticed on the local levels.

Sources:

White Paper on Frac Sand Mining

ESRI

DNR

DOT

Friday, April 7, 2017

Geocoding

Goals and Objectives:

The goals of this assignment are to learn how to geocode address and PLSS locations and be able to compare them to the work done by others.  In the context of this sand mining project, 19 mine locations were provided and the goal was to geocode them and compare them to the same mines geocoded by classmates.  This provides an opportunity to analyze potential errors.  In order for the geocoding to be done, the data had to be normalized.

Methods:

The first step in the process of geocoding was to ensure that the data was normalized.  This meant manipulating the given datasets in Microsoft Excel so that each record would be in the same format.  Essentially, this split the different parts of the addresses up in a way that the geocoding tools could use to locate the desired locations.

The geocoding portion itself required two different processes. One way was used when the actual addresses were provided.  The other way was used when only PLSS information was available and the locations had to be found with a much more manual process.  When addresses were provided, the "Geocode Addresses" function could be used.  This would take the information from the normalized table and generate a list of options of potential locations of where the address could be.  I would then go and zoom to these locations and select the one that appeared most accurate using the imagery base map as well as Google Maps satellite view as a reference. 

When the physical address information was not available, the PLSS coordinates were used to locate the sand mines.  This was a much more manual process and required the use of the imagery base map, as well as layers displaying PLSS sections and townships in order to get a general idea of where these mines are.  When the mine was located, the "Geocode Addresses" function was used to mark the location for that point. In many cases, the mines that had addresses attached also included PLSS information.  When this was the case, the PLSS information was used to verify the accuracy of the address information.  When all 19 mines were geocoded, a shapefile was created to share with the rest of the class.

The next portion of the assignment was to compare my geocoded mines with the same mines that other people in the class also geocoded.  In order to do this, all of the shapefiles were brought into ArcMap and then the merge tool was used to combine them all into one.  Before doing anything else, since this step required measuring distances, I made sure that all of my data was projected using the same projection.  Using the merged layer, I was able to query out the 19 mines that I also geocoded and then selected a sample of 5 mines that had at least 2 other people to compare my work to. I then used the "Point Distance" tool to measure the distance between the point I thought was the location of the mine with the locations 2 of my classmates thought were the correct locations.  A table was then generated with these results.  The same process was then done to compare the same 5 mines with the true locations provided by data from the DNR.

Results:

Table 1:  The original location data that has not been normalized.  It is a mixture of actual addresses and PLSS locations all in one field.
Table 2: Locational information after it has been normalized.  This involved splitting the given information up in ways that that the geocoding function would be able to decipher.
Figure 1:  This is a map of the 19 sand mine locations that I geocoded.
Table 3:  This table shows the distance in feet from the location I placed each mine with the location 2 other people placed the same mine.  The 2 other people's locations received the "a/b" label in the table in order to differentiate the same intended mine with different people's locations.  The very large distances in some cases and high standard deviation shows that there were a few instances were the location I thought was correct and that of my classmates were at different mines all together.  In other instances, the point would be at the same mine, just a different entrance, thus causing a discrepancy.
Table 4: This table shows the distance between my point and the truth point provided by the DNR.  In many cases with this I noticed that my points were at the entrance to the mine whereas the DNR points would be inside the mines themselves.  Overall, the mean and standard deviation are relatively small distances showing there wasn't a huge amount of discrepancies.
Figure 2: This map shows an example of a difference in my data versus that of the DNR.  The points are clearly marking the same mine, however the actual placement of the points is different.  This was a common error that I noticed throughout my dataset.

Discussion:

While the geocoding process is generally reliable, it is impossible to be completely free of errors. This can be seen first of all through the differences in each person's points as noted in the tables and figures above.

Throughout this process there are both inherent and operational errors present.  Inherent errors occur due to the nature of how geographic data is represented.  This occurs when projecting the round earth onto a flat surface.  In this case, that could have an impact when trying to measure the distances between points.  Another way this could have an impact is when trying to match a point with the imagery base map because the base map could be outdated.  I noticed significant differences in the imagery when viewing it at different extents.  When the data was originally collected, there could be an inherent error depending on the equipment used for the collection purposes as well.

Operational errors occur due to human nature.  The differences in where points were placed on the imagery could be due to people interpreting the base map image differently.  It could also be due to people working at different scales when placing points.   There could also have been an operational error made when collecting the data in the first place.  It is nearly impossible to avoid these errors completely.

It is difficult to ultimately know which points are correct and which ones are not.  The best way to ensure accuracy is to actually go to the locations and verify the point.  However, this is not always possible.  In this case, the most feasible way to ensure data accuracy would be to compare as many different people's geocoded points as possible.  Even while doing that, however, it is impossible to ensure the data is completely correct without physically going to the locations and collecting the raw data.

Conclusions:

Overall, geocoding is a good way to create data points from addresses given.  While it may not be a perfect way to get data points every time, it is certainly a way to save time instead of going to each location in order to get a point.  This is not without limitations, however.  There will always be some risk of error when doing this process.  When examining the data for errors, while it would be nice to be able to check each point, that is not always possible.  In most cases a sample of the data will be examined, as was the case in this lab.  Future studies may want to check more locations than just the sample, as well as comparing the locations to a larger sample of other people's work.