Ecological Data Services, LLC
Data Engineering and Analytics Consulting

iNaturalist Datasource to MS Fabric Data Lakehouse

Lakehouse Data Storage (OneLake) The following Fabric resources were created following the above architecture guideline.

  1. ADFS Gen 2 file folder for raw JSON files (Bronze Medalion Level)
  2. Delta tables for tabular data from JSON transformations (Sliver Medalion Level)
  3. SQL analytical data mart with star schema and conformed dimensions (Gold Gold Medalion Level)

Synapse Pipeline

  1. Web Activity to get results per page and total results
    
  2. For Each Container

    • Calculate pages to iterate
    • Copy Data - API Response JSON to Data Lake folder
    • Save data to Lakehouse file storage, appending page number to the destination file name.
  3. PySpark Notebook
    
    • Read data from Lakehouse file storage
    • Transform JSON in dataframe
    • Write dataframe to Delta table
  4. Copy Data – Delta table to SQL Landing table
    
    • Configure Endpoints
    • Map Columns
  5. Execute Stored Procedures for incremental load to SQL Dimension and Fact tables
    

Here is the resulting pipeline:

Fabric Pipeline

Notes:

iNaturalist APIs
iNaturalist Observations - Power BI
iNaturalist Observations - Power BI
Copy Data Pagination Support

, , , — Nov 14, 2024