Ecological Data Services, LLC
Delivering Data Focused Solutions for Agricultural and Ecological Research

The Gridded National Soil Survey Geographic Database & Microsoft's Planetary Computer (part 2)

This is a continuation of my previous post The Gridded National Soil Survey Geographic Database & Microsoft’s Planetary Computer (part 1) Part one shows how to access meta data to create a data dictionary for gNATSGO tabular datasets.

Using the data dictionary to identify tables of interest I created the following function and python notebook code to extract tabular data from the following tables to analyze crop yield in the Blackland Prairie region of texas.

sacatalog
legend
mapunit
mucropyield

Reading and exporting data from gNATSGO tabular data sets (Python Notebook Code)

View and download code from my Github Repository

# Import required libraries import planetary_computer
import numpy as np
import pandas as pd
import rioxarray
import xarray
import pystac_client

# Create planetary computer client
catalog = pystac_client.Client.open(
https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)

# Empty Dataframe
df_dictionary = pd.DataFrame(columns = [‘name’,’type’,’description’,’table_name’])

# Search catalog for gnatsgo table collection
search = catalog.search(
collections=[“gnatsgo-tables”]
)

# Function to read and export specified table and column list from gNATSGO data sets
def getTableDataAsCsv(table_name, table_key, column_list, csv_file_name):

    catalog_items = catalog.get_collection("gnatsgo-tables").get_item(table_name)
    
    table_data = pd.read_parquet(
        catalog_items.assets["data"].href,
        columns= column_list,
        storage_options= catalog_items.assets["data"].extra_fields["table:storage_options"],
        engine="pyarrow",
    )
    if table_key > "":
        table_data = table_data.set_index(table_key)
    
    df = pd.DataFrame(table_data)
    pd.set_option('display.max_colwidth', 500)

    df.to_csv(csv_file_name)

# Example function use for mapunit table
columns = [“mukey”, “lkey”, “musym”, “muname”, “mukind”, “mustatus”, “muacres”, “mapunitlfw_l”, “mapunitlfw_r”, “mapunitlfw_h”, “mapunitpfa_l”, “mapunitpfa_r”, “mapunitpfa_h”, “farmlndcl”, “muhelcl”, “muwathelcl”, “muwndhelcl”, “interpfocus”, “invesintens”, “iacornsr”, “nhiforsoigrp”, “nhspiagr”, “vtsepticsyscl”, “mucertstat”]

getTableDataAsCsv("mapunit", "mukey", columns, "mapunit.csv")

After using the above notebook to extract data from the tables of interest, the data was loaded to Power BI and the following relationships were created.

gNATSGO Map Unit Crop Yield Relationships

Part 1: The Gridded National Soil Survey Database - Reading meta-data to create a data dictionary

Part 3: Blackland Prairie Crop Yields (Power BI Visualizations)

Planetary Computer gNATSGO Table Collection


, , — Jan 13, 2023