Making Geospatial Data Accessible Through Community Collaboration

Making Geospatial Data Accessible Through Community Collaboration

In today's world, geospatial data has become an interwoven sea of gleaned information, which has the power to inform and predict many aspects of our daily lives. 

Geospatial data is a type of data that is used to map places, things and events relative to where they are on the planet. Mapping temporal patterns of land and water on Earth, predicting climate patterns and identifying power grid attributes are just a few examples of how this data is applied. It has revolutionized how we collect and disseminate large spatial data at an incredibly accurate and precise resolution. 

The research community in particular relies heavily on the use of this type of data as the core of informing analytical models that scale the understanding of conditions and results over a large temporal and spatial scale. However, with great power comes great challenges, and geospatial data has no shortage of its own problems. 

ACEP research associate professor Erin Trochim is part of a team working on the Awesome GEE Community Catalog, a community catalog that consists of community-sourced geospatial datasets made available for use by the larger Google Earth Engine community and shared publicly as Earth Engine assets. The effort was led by a team of developers and researchers, including Trochim, who uses AGEECC to host a lot of her curated infrastructure datasets which include Alaksa.

Trochim, along with co-developers Valerie Pasquarella, a Boston University research assistant professor, and Samapriya Roy, solutions architect at Planet, traveled to the annual Google for Good Summit to present on the AGEECC platform and to announce the launch of the new website.

“The core goal of this project is to make datasets more accessible and discoverable in order to positively benefit communities around the world through science,” said Roy.

The idea for the AGEECC came when the team realized that researchers who were publishing papers using geospatial datasets had no place to host them publicly after publishing. They were either hosting them on private internet portals or cloud-based drives. Not having these cited datasets and associated information packages puts the research community at a huge disadvantage. 

AGEECC offers a hub where anyone can search for datasets and have access to associated data scripts with all the information needed to take the data and use it immediately without all the normally required preprocessing. Researchers can find datasets not currently publicly available, interact with the data curators, upload their own datasets, contribute code examples and report issues with datasets that they find.

Trochim, who works with circumpolar datasets related to her energy research, emphasized the importance of better documenting issues and bugs with global datasets, as these datasets are often used to inform large-scale global decisions. AGEECC addresses this need by highlighting the automation capabilities within the platform, specifically around version update requests.

“No data is perfect,” said Trochim. “To know how and when this happens is super valuable, especially at the beginning and end of a project.” 

Roy emphasized that “as rewarding as it has been to see AGEECC serve as a valuable data hosting platform, it has been equally rewarding to see it grow as a space that promotes conversation and collaboration amongst the users, which will strengthen the quality of the data being hosted.”

Watch the presentation in its entirety here: https://earthoutreachonair.withgoogle.com/events/geoforgood22/watch?talk=day2-track1-talk3

 

A screenshot of AGEECC's HydroLAKES database landing page.