Problem: 14 distinct schema versions over 3 years. Rigid TimescaleDB ingest pipeline silently dropped rows it could not parse. Nobody noticed because new data looked correct. 40% of historical data unqueryable. Soil moisture analysis the agronomist team needed was impossible.
Solution: Phase 1: schema archaeology — every firmware version in Git history analysed to extract payload shape; 14 versions identified. Phase 2: versioned parsers processed raw MQTT payloads from S3 backup through version-specific normalisation into a recovery table. Checksum on device ID + timestamp deduplicated against existing table. Phase 3: new ingestion pipeline validates payload against declared firmware version field. Unknown versions written to quarantine topic — no silent drops.
Technology: TimescaleDB · MQTT · Python · S3 · dbt · EMQX
Optimisation pattern: rigid-schema-ingest-to-version-aware-parser-with-quarantine-topic
Outcomes:
Historical data queryable: 60% → 98.3%. Silent data loss: eliminated. 3-year soil moisture analysis completed. Firmware schema change process formalised — version field mandatory in all future builds.