ArgoCD + Argo Rollouts: 200-service GitOps canary delivery

Problem: Manual kubectl apply meant the applied manifest might differ from what was in Git. No record of who deployed what or when. Rollbacks required finding the previous image tag in Slack. Change failure rate 18% (industry benchmark ~15% for this scale).

Solution: Service-by-service ArgoCD migration over 10 weeks. Direct kubectl access to production revoked — GitOps sync became the only deployment path. Caught 14 config drift incidents in the first 2 weeks. Argo Rollouts: 5% canary for 10 minutes, Prometheus analysis on error rate and P99 latency, automatic abort on threshold breach. Slack message with analysis results win or lose.

Technology: ArgoCD · Argo Rollouts · Kubernetes · Prometheus · Helm · GitHub

Optimisation pattern: manual-kubectl-to-argocd-gitops-with-prometheus-gated-canary-rollout

Outcomes:

Change failure rate: 18% → 3%. Config drift incidents: eliminated — ArgoCD self-heals within 3 minutes. 100% of deployments are PRs with reviewer and timestamp. MTTR on bad deploy: 22 min → 4 min.

Your privacy, your call

GitOps for 200 microservices: change failure rate 18% → 3%, manual kubectl retired