Problem: POS used email as customer key; loyalty used card number; e-commerce used UUID. Joining them was a manual weekly task. Analysts spent 60% of their time fixing data rather than analysing it.
Solution: Airbyte extracted from all 6 sources into BigQuery raw tables every 45 minutes. A deterministic identity resolution model normalised email, phone, and loyalty card per source and assigned a canonical customer ID via priority-ordered merge. dbt built staging → intermediate → mart layers. The customer 360 mart refreshed every 47 minutes. Elementary sent Slack alerts on freshness or row-count test failures.
Technology: BigQuery · dbt · Airbyte · Airflow · Elementary · Python
Optimisation pattern: per-source-manual-join-to-deterministic-identity-resolution-in-dbt
Outcomes:
Monday analyst task eliminated — 4 hours/week recovered. CRM team adopted the mart within 6 weeks. ML churn model precision improved 23% trained on unified 18-month history. 94.7% of customers matched across 2+ sources on first run.