Problem: A job-board affiliate aggregating XML feeds from Adzuna, Talroo, Monster, and 12 other providers needed to parse, deduplicate, and delta-sync to Elasticsearch inside a 30-minute feed window. Java StAX/SAX consumed the entire window on parsing alone. MySQL LOAD XML rewrote the numbers.
Solution: Moved XML parsing into the MySQL engine itself using LOAD XML LOCAL INFILE with a provider-specific staging table schema. A stored procedure ran a three-way merge (insert/update/soft-delete) after each load. Only delta rows — flagged by a sync_status column — were forwarded to Elasticsearch via bulk API. Feed window utilisation dropped from >95% to ~22%.
Technology: MySQL · Elasticsearch · Java · Go
Optimisation pattern: java-stax-to-mysql-loadxml
Outcomes:
Feed ingest: 28–34 min → under 5 min per country. Delta sync to Elasticsearch: under 90 seconds. Feed window headroom now supports 3× more providers.