Utah Department of Natural Resources - Division of Wildlife Resources
Hallie Johnson - University of Utah
The American White Pelican, a migratory bird native to the US, has historically used the areas in Utah around Utah Lake and the Great Salt Lake as nesting grounds (DNR Publication). Over the past several years pelican populations have declined in relation to lower water levels of the Great Salt Lake. The Department of Natural Resources, Division of Wildlife Resources (DNR - WR) has been tracking the migratory patterns of these birds for several years, utilizing Argos sensors to monitor their movements. The data produced by the Argos sensors are intended to help DNR - WR understand the various factors that contribute to the health of the pelican population. Data quality issues from the Argos sensors have posed a problem to the DNR - WR and prevented them from performing meaningful analysis on the dataset. The objective of this undertaking is to provide DNR - WR with a process for quality assuring the sensor output and a meaningful platform for analyzing sensor data pulled from sensors tracking the birds in three dimensional space and over time. The DNR - WR has plans to launch a new sensor tracking program to continue monitoring these birds, so both the quality assurance processes and platform must be able to ingest the information regardless of technology.
The primary focus of this project is developing a meaningful tool for analyzing pelican movements through space and time. To achieve this objective certain data processing procedures needed to be developed to identify outliers and other data errors from the sensors. DNR-WR requested in addition to the analytical tool, an automated process be developed to normalize and quality assure (QA) incoming data.
Data QA
In order to provide users with reliable data a script was developed to automate the QA of incoming sensor data. DNR-WR made a provision that the script should be universally applicable to data sources, in the event they wished to deploy the script on different sensor data or for different species.
An evaluation of the sensor data was conducted to identify common anomalies. From this evaluation a collection QA parameters was developed. Data errors included impossible elevations and sensor readings from beneath the earth as well as locations well outside of pelican habitat, see Figure 2. There were also a number of fields with an assortment of incomplete or inaccurate records reported from the sensor, such as negative altitudes or timestamps collected decades prior to the sensors being deployed. Once a comprehensive list was compiled of data issues, they were presented to DNR-WR for validation and feedback. Any feedback was incorporated and a final document was provided to the DNR-WR group detailing the QA parameters the script would aim to solve.
Following the formalization of the QA parameters, methods for isolating and removing outliers and data errors from the dataset were developed. These methods included how to identify location errors, data inconsistencies and the format for identifying bad data. Python was chosen as the language to develop the automated QA script, in order to leverage ArcPy and several geoprocessing modules within the library, such as point-to-line processes (Figure 3). The code was developed using ModelBuilder in ArcGIS Pro and testing and debugging in Jupyter Notebook. Data sources and input parameters were programmed as variables, rather than hard coded, to allow the DNR-WR to apply the logic of the script to other datasets. Figure 4 is an illustration of field errors identified by the completed script.
Rule Sets:
Solving for Location Errors
Geoprocess: Point to line - subset of pelican data (20% of points)
Order by time stamp
Geoprocess: Buffer - 32000 meters around new line feature
New field: Boolean expression
0 = inside buffer
1 = outside buffer
Incomplete data entries
New field: Boolean expression
0 = complete entry
1 = incomplete entry
Data Collection Errors
Altitude:
New field: Display altitude field - normalize data type (int)
Set (-) values to 0
New field: Boolean expression
0 = numeric value
1 = text/error value
A training session was conducted with the DNR-WR point person to explain the script in detail. This involved a step by step walkthrough of each section of code as well as highlighting important comments which explain how to alter the code as needed. Several rounds of feedback and edits were completed after this training session to adapt the QA code for use in the DNR-WR’s environment. Documentation for all sources and reasoning for QA parameters were provided in data issues document and pseudo-code documentation
Web-based platform for viewing and analyzing sensor data
At the onset of this project the DNR-WR was undecided as to what platform they wanted a web application developed in. The first meeting with the project partners provided a general idea of the issues with the current web application and what components from the data the group wished to be able to analyze. Following this meeting, research was conducted to determine the most effective analytical web-based platform for the DNR’s objective. From this research two options emerged as the best choices:
Google Earth Engine(GEE) code editor & App Builder
ArcGIS Online & ESRI JavaScript API
A list of features and capabilities was compiled for review by the DNR group. This review led to the decision to pursue GEE, as DNR-WR had plans for similar projects related to migratory animals to be housed in another google based platform. The DNR-WR does not currently have a Google Cloud Storage environment to publish the cleaned data in, however the process was developed and documented for them once that framework is complete, as well as the steps to import the data to the GEE application.
The DNR-WR group identified several components of the pelican data they wished to address in the web application: the location of the sensor readings and the altitude and time stamp collected for each reading. It was important to develop meaningful and applicable symbology for these components of the dataset. To accomplish this, individual sensor readings are represented as point data on the map in the application. To illustrate the change in time for the birds a color scheme was created to reflect the four seasons in a year. Each point was assigned a color from this scheme based on when it was collected. In addition to the feature symbology, an interactive component was added to the application to allow the user to change the underlying satellite imagery to match the season of the data being reviewed, see Figure 6. Rather than symbolize the data based on the altitude values in the application, a chart is used to reflect the altitude based on user input. This chart can be exported as is, or the data can be exported for use in another charting software, see Figure 7.
In trying to determine the best way to illustrate the three main components of this dataset, it was decided that an app gallery might be the best approach. While there was not sufficient time to develop multiple applications under this project, a roadmap was provided for future projects.
Throughout the duration of the project the web application was launched and capabilities demonstrated for DNR - WR representatives. Feedback was provided by the DNR-WR at each of these demonstrations and incorporated when possible into the functionality or approach of the application.
This project initially started with just a general overview of the DNR-WR’s frustration in the quality of their sensor data, and the time it was taking to QA it, as well as their dissatisfaction with the current platform for viewing the pelican data. This allowed for a great deal of research and experimentation to provide a viable solution that would solve the DNR-WR’s pain points. As a result a product has been developed for this project that can evolve beyond the original parameters and potentially allow for collaboration with other types of research, such as that being conducted by the University of Montana to identify important habitats to migratory bird networks.
GEE is a very powerful tool for analyzing and visualizing large datasets, especially when incorporating raster sets that are publicly available through their data catalog. As all the storage and computing of their data is performed in Google’s servers, it is easy to bring in raster data that covers very large areas into a web application and share research with others. The code for each application is made public which allows others to replicate the research conducted in these applications.
There are some limitations to working with the JavaScript API in GEE. JavaScript is not the most powerful language for analyzing data, so there are some limitations or unnecessary complications to performing actions that are relatively simple with R or python. The way GEE handles server-side and client-side data in their customizable widgets also creates some unnecessary confusion when developing applications and trying to harness the outputs of certain functions.
Development of the QA script also led to some great insights into simplifying a process that initially seemed very complex. Initial analysis of the sensor data led to a long list of errors that would need to be flagged or corrected in some manner. The most challenging data error was related to the location errors generated when the sensor cannot get an accurate satellite reading and creates points well outside of the bird’s flight path. Manually creating a process in ArcGIS Pro utilizing a series of attribute selections to isolate individual birds, then handpicking points to generate data for the “Point to Line” geoprocess and finally performing several processes to exclude data from the original selection based on a buffer and “Select by Location”, was successful but did not seem like a replicable process, see Figure 1. However using a combination of Model Builder in ArcGIS pro, to test the ArcPy processes, and Jupyter Notebook to allow the process to live outside of the ESRI environment, a process was created that can easily be adapted to quickly QA other data from other sensors or related to other species. The output can be adjusted to include or exclude more data based on the model parameters. This process can be run as many times as needed to determine the most accurate settings.
This project was initiated to solve two particular problems the DNR-WR was experiencing, which was the time involved correcting data errors from their pelican tracking sensors and the dissatisfaction with their current analytical platform PeliTrak. Lacking any sort of concrete direction for solving these problems, a great deal of flexibility was provided in which to search for the right solution.
This search led to the development of a useful and adaptable data QA script which can be deployed on the pelican data set from the current sensor and any future sensor output, as well as apply to other migratory animals or datasets involving movement through time. In addition to the QA script, this project enabled an in-depth perusal of the GEE platform and other Google products to arrive at a viable solution to the DNR-WR’s lack of analytical platform. This perusal has shown the capabilities of Google’s open source platforms in handling complex spatial analysis as well as creating sophisticated and dynamic web platforms for sharing this information. The seamless nature of Google’s other platforms allows for easy interaction of this data in different settings, which is a valuable feature when dealing with datasets across an organization as large and as modular as the DNR.
While very labor intensive, this project has also enabled the development and improvement of several skill sets including increased expertise in python, especially the ArcPy library, JavaScript and the various Earth Engine functions, as well as navigating Google Cloud Storage and other open source software from the Google suite. In the GIS field open source solutions are quickly becoming viable alternatives to proprietary software such as ESRI’s GIS tools and it is important to understand the capabilities of these platforms and their potential to complement or augment other software.
GIS Analysis
Spatial Data and Algorithms
GIS Workflow
Model Building
Cartography and Graphic Design
Project Design
Project Management
Communication Skills
Basic Programming and Scripting