iNaturalist Datasource to MS Fabric Data Lakehouse

Ecological Data Services, LLC

Data Engineering and Analytics Consulting

iNaturalist Datasource to MS Fabric Data Lakehouse

Lakehouse Data Storage (OneLake) The following Fabric resources were created following the above architecture guideline.

ADFS Gen 2 file folder for raw JSON files (Bronze Medalion Level)
Delta tables for tabular data from JSON transformations (Sliver Medalion Level)
SQL analytical data mart with star schema and conformed dimensions (Gold Gold Medalion Level)

Synapse Pipeline

Web Activity to get results per page and total results

For Each Container
- Calculate pages to iterate
- Copy Data - API Response JSON to Data Lake folder
- Save data to Lakehouse file storage, appending page number to the destination file name.
```
PySpark Notebook
```
- Read data from Lakehouse file storage
- Transform JSON in dataframe
- Write dataframe to Delta table

Copy Data – Delta table to SQL Landing table

Configure Endpoints
Map Columns

Execute Stored Procedures for incremental load to SQL Dimension and Fact tables

Here is the resulting pipeline:

Fabric Pipeline

Notes:

While some api paging features are built into the copy data activities the iNaturalist API implementation paging required dynamically setting the url in a for each loop. This requires some string concatenation to build the url with the correct query parameter string for each api request.

iNaturalist APIs
iNaturalist Observations - Power BI
iNaturalist Observations - Power BI
Copy Data Pagination Support

Datalake, Fabric, Lakehouse Architecture, iNaturalist — Nov 14, 2024

© 2025 Ecological Data Services, LLC | Schedule a Meeting | Email | Linked In