projects

Tools, frameworks, and experiments built while solving real data engineering problems in production.

delta-merge-toolkitactive

Reusable PySpark library for incremental MERGE operations with UPSERT and SCD Type 2 support. Battle-tested on 50M+ row Delta tables in Databricks.

PySpark · Delta Lake · Databricks
adf-cdc-templatesactive

Parameterized Azure Data Factory pipeline templates for watermark-based CDC ingestion. Handles soft deletes, schema drift, and retry logic out of the box.

ADF · Azure SQL · ADLS Gen2
unity-catalog-bootstrapin progress

Terraform + Python scripts to provision Unity Catalog metastore, catalogs, schemas and permissions from a declarative YAML config. Reduces governance setup from days to hours.

Unity Catalog · Terraform · Python
data-contract-validatorin progress

CLI tool that validates incoming datasets against a YAML-defined data contract before Bronze ingestion. Catches schema drift and null violations at the source.

Python · Great Expectations · YAML
lakehouse-monitoring-dashboardarchived

Streamlit dashboard connected to Databricks system tables and ADF run history. Shows pipeline SLA, row volume trends, and data freshness per table.

Streamlit · Python · Databricks SQL