Ecological Data Services, LLC
Delivering Data Focused Solutions for Agricultural and Ecological Research

The Gridded National Soil Survey Geographic Database & Microsoft's Planetary Computer (part 1)

The Gridded National Soil Survey Geographic Database (gNATSGO) is a USDA-NRCS Soil & Plant Science Division (SPSD) composite database containing soils information for all areas of the US and Island Territories. It combines data from three sources in the USDA’s Ag Data Commons:

  1. SURRGO – Soil Survey Geographic Database
  2. STATSGO2 - State Soil Geographic Database
  3. RSS – Raster Soil Survey

Microsoft’s Planetary Computer

Microsoft’s Planetary Computer, available for preview by request, provides access to a diverse collection of public earth-science data via the Planetary Computer Hub. The hub provides developer workspaces pre-configured with API’s, compute & storage resources to access and process public earth science data. Workspace language preferences are available for Python, R, Tensorflow & QGIS. I used used the Python notebook workspace to create the following notebook that lists tables and columns in the gNATSGO collection.

gNATSGO Data Dictionary (Python Notebook Code)

Github Repository

# Import required libraries import planetary_computer
import numpy as np
import pandas as pd
import rioxarray
import xarray
import pystac_client

# Create planetary computer client
catalog = pystac_client.Client.open(
https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)

# Empty Dataframe
df_dictionary = pd.DataFrame(columns = [‘name’,’type’,’description’,’table_name’])

# Search catalog for gnatsgo table collection
search = catalog.search(
collections=[“gnatsgo-tables”]
)

# Get a list of tables
collection_tables = list(search.get_items())
df_dictionary = pd.DataFrame(columns = [‘name’,’type’,’description’,’table_name’])

# Populate dataframe with table meta-data
for i in collection_tables:
df_i = pd.DataFrame(i.properties[“table:columns”])
df_i[“table_name”] = i.id
df_dictionary = pd.concat([df_dictionary, df_i], ignore_index=True, sort=False)
pd.set_option(‘display.max_colwidth’, 1000)

  display(df_dictionary.style.set_properties(**{'text-align': 'left'}))

# Save Data Frame as .csv
df_dictionary.to_csv(‘gNATSGO Data Dictionary.csv’)

---

Download resulting meta-data file: gNATSGO Data Dictionary.csv


Planetary Computer gNATSGO Table Collection


Next up

Part 2: Exporting Table Data from the gNATSGO Collection and defining table relationships

Part 3: Blackland Prairie Crop Yields (Power BI Visualizations)

, , — Dec 7, 2022