The Gridded National Soil Survey Geographic Database & Microsoft's Planetary Computer (part 2)

This is a continuation of my previous post The Gridded National Soil Survey Geographic Database & Microsoft’s Planetary Computer (part 1) Part one shows how to access meta data to create a data dictionary for gNATSGO tabular datasets.

Using the data dictionary to identify tables of interest I created the following function and python notebook code to extract tabular data from the following tables to analyze crop yield in the Blackland Prairie region of texas.

sacatalog
legend
mapunit
mucropyield

Reading and exporting data from gNATSGO tabular data sets (Python Notebook Code)

View and download code from my Github Repository


  # Import required libraries
      import planetary_computer  

      import numpy as np  

      import pandas as pd    

      import rioxarray  

      import xarray  

      import pystac_client  



   # Create planetary computer client
      catalog = pystac_client.Client.open(
          “https://planetarycomputer.microsoft.com/api/stac/v1",
          modifier=planetary_computer.sign_inplace,
      )
   # Empty Dataframe
      df_dictionary = pd.DataFrame(columns = [‘name’,’type’,’description’,’table_name’])
  # Search catalog for gnatsgo table collection
      search = catalog.search(
          collections=[“gnatsgo-tables”]
      )
  # Function to read and export specified table and column list from gNATSGO data sets
    def getTableDataAsCsv(table_name, table_key, column_list, csv_file_name):
    catalog_items = catalog.get_collection("gnatsgo-tables").get_item(table_name)
    
    table_data = pd.read_parquet(
        catalog_items.assets["data"].href,
        columns= column_list,
        storage_options= catalog_items.assets["data"].extra_fields["table:storage_options"],
        engine="pyarrow",
    )
    if table_key > "":
        table_data = table_data.set_index(table_key)
    
    df = pd.DataFrame(table_data)
    pd.set_option('display.max_colwidth', 500)

    df.to_csv(csv_file_name)

  # Example function use for mapunit table
    columns = [“mukey”, “lkey”, “musym”,  “muname”,  “mukind”,  “mustatus”,  “muacres”,  “mapunitlfw_l”,  “mapunitlfw_r”,  “mapunitlfw_h”,  “mapunitpfa_l”,  “mapunitpfa_r”,  “mapunitpfa_h”,  “farmlndcl”,  “muhelcl”,  “muwathelcl”,  “muwndhelcl”,  “interpfocus”,  “invesintens”,  “iacornsr”,  “nhiforsoigrp”,  “nhspiagr”,  “vtsepticsyscl”,  “mucertstat”]
getTableDataAsCsv("mapunit", "mukey", columns, "mapunit.csv")

After using the above notebook to extract data from the tables of interest, the data was loaded to Power BI and the following relationships were created.

gNATSGO Map Unit Crop Yield Relationships