Author

Started June 1, 2025

Completed

Data Quality Auditing Framework

Implemented automated data validation using Great Expectations and Apache Airflow.

Data Validation Great Expectations Airflow SQL Server Data Quality

Overview

Built a data auditing pipeline that detects missing, duplicated, or schema-breaking data in critical financial datasets. Used Great Expectations for test definitions and Apache Airflow for orchestrated runs across historical and incremental loads.

It demonstrates:

  • Data validation on financial systems
  • Airflow DAG design
  • Automated QA workflows
  • Column-level profiling and anomaly detection

Architecture

  • Airflow schedules and monitors validation DAGs
  • Great Expectations performs column-level assertions
  • SQL Server acts as the primary data source
  • Slack API used for alerting on validation failures

Status

Audits are now integrated into every ETL run. Alerting reduced QA turnaround time by 80%.

Challenges

  • False positives on NULL fields due to evolving schema
  • Synchronizing audit time windows with ETL cutoffs
  • Managing GE expectations across staging vs prod

Lessons Learned

  • Keep expectations versioned and environment-scoped
  • Null tolerance must match data freshness expectations
  • Use Data Docs for stakeholder visibility

Related Services

  • Data quality monitoring
  • ETL validation systems
  • Great Expectations onboarding
  • Airflow integration for audits

Recent Activity

Wired Slack alerts to Great Expectations failures in Airflow; automated QA coverage now live.

Airflow Airflow Great Expectations Great Expectations

Launched column-level data validation suite using Great Expectations against core financial tables.

SQL SQL Great Expectations Great Expectations