Dubai, UAE 16 years in production

Guilherme Jaccoud

Platform Engineer / SRE
Scroll · 01 / 07

Systems engineer and platform architect. I build the reliability substrate that other engineers build on top of.

Sixteen years across fintech, hyperscale e-commerce, and crypto, specializing in platform engineering, distributed systems, and developer experience.

Deep AWS, GCP, and Cloudflare experience, paired with financial-grade reliability and incident response.

I write Rust for the parts that matter, Python for the parts that don't, and Bash for the parts I'd rather not admit.

99.88 %
Sustained availability
900 nodes
Peak fleet operated
$12M /yr
Infra cost removed
30 min
Median MTTR
2021
2026

Kraken

Senior Site Reliability Engineer
  • Maintained 60+ Rust backend services, contributing to source code, configuration, metrics, pipelines, dashboards, alerts, and infra-as-code specifications, while serving on the Layer-1 on-call rotation.
  • Owned the API gateway end-to-end — code, configuration, release pipeline, and weekly production deployments — accountable for every release and incident on that surface.
  • Designed ephemeral, production-parity AWS environments provisioned from CI/CD pipelines — giving product teams isolated sandboxes to validate features, and powering the regulatory demonstrations that secured Kraken's 2024 qualified-custody approval.
  • Provisioned the ingress infrastructure behind Kraken's Travel Rule anti-money laundering (AML) program, supporting the exchange's compliance with FATF obligations.
  • Participated in the SRE council, building internal tooling and engineering standards adopted org-wide; wrote an Atlantis-style GitOps automation system for Nomad.
Rust Nomad Vault Consul Redis Elasticsearch Prometheus Grafana nginx HAProxy Terraform AWS
2020
2021

Delivery Hero

Staff Platform Engineer
  • Partnered with product teams to surface and eliminate infrastructure friction — promoting platform adoption and improving the overall developer experience.
  • Architected an active-active multi-region Kubernetes deployment — 50/50 traffic distribution via Cloudflare LB, with automatic cluster failover — ensuring zero-downtime through regional incidents.
  • Designed a GitOps strategy with ArgoCD as the sole deployment authority and Teleport's Just-In-Time access as a break-glass mechanism — eliminating ad-hoc access to production.
  • Implemented a daily load-testing pipeline on Terraform and K3s — spinning up ephemeral infrastructure, running developer-authored K6 scripts, and pushing results to New Relic — surfacing scalability headroom and bottlenecks daily.
  • Introduced chaos engineering practices, running regular Chaos Monkey exercises to validate disaster recovery scenarios.
Rust TypeScript Kubernetes ArgoCD Teleport Chaos Monkey K3s K6 New Relic Cloudflare AWS
2019
2020

Delivery Hero

Senior Platform Engineer
  • Led the SRE team through Talabat's post-acquisition migration, building the cloud-native platform behind its explosive regional growth and consolidation as MENA's largest food-delivery app.
  • Designed the overall platform migration to AWS — introduced immutable-infrastructure concepts, IaC best practices, and Atlantis for self-service changes; established a Direct Connect between datacenter and AWS for gradual, zero-disruption migration.
  • Architected the complete Kubernetes platform — networking, storage, and ingress — and hardened it with service-to-service mTLS via Vault and OPA Gatekeeper policies enforcing security, resource, and naming guardrails.
  • Partnered with development teams on the migration from legacy .NET Framework to .NET Core on Kubernetes; introduced the Serverless Framework for event-driven (SQS/Lambda) workloads.
  • Evolved the observability stack, introducing Vector as a unified logs and metrics collection pipeline routing telemetry to New Relic.
Terraform Atlantis Kubernetes Traefik Vault cert-manager Gatekeeper Vector New Relic AWS
2016
2019

Symphony

DevOps Engineer
  • Led the Infrastructure-as-Code migration from CloudFormation to multi-cloud Terraform, following Google's strategic investment in Symphony; authored reusable modules covering networking, persistence, and orchestration layers, including GKE and EKS.
  • Orchestrated infrastructure provisioning with Jenkins — making every environment change auditable and reproducible — giving development teams a self-service UI to spin up, update, and tear down development and production environments without SRE intervention.
  • Operated a heterogeneous persistence tier — HBase, Hadoop, MongoDB, Solr, Elasticsearch and Kafka — baked into immutable images with Packer and run on auto-scaled instances across tenant environments.
Python Kubernetes Terraform Packer MongoDB Elasticsearch Kafka Jenkins Groovy GCP AWS
2010
2016

Tropicloud

Founder / DevOps Engineer
  • Architected a managed WordPress platform on NGINX, PHP-FPM, and MariaDB — static assets via S3 / CloudFront CDN and Redis full-page cache — achieving 10x faster loading than traditional shared hosting.
  • Designed per-tenant isolation on AWS — each customer site provisioned with its own VPC, ALB, Auto Scaling Group and RDS cluster, behind Cloudflare WAF + DDoS protection — guaranteeing resource isolation and scalability under heavy traffic.
  • Rebuilt the entire platform on Kubernetes in 2014 — among the earliest production deployments in Latin America — reducing onboarding from hours to minutes and enabling zero-downtime rolling deploys.
Kubernetes Rancher WordPress nginx PHP MariaDB Redis Varnish Cloudflare AWS
Feb 2026

Orphic

  • Designed a fully managed AI Agents platform for OpenClaw, Hermes, and Paperclip — automatic provisioning, managed isolated runtime, secrets management, and persistent memory — on Cloudflare Workers and GKE Autopilot.
  • Implemented on-demand Cloudflare Tunnel access to per-tenant agent dashboards, routing wildcard session URLs through a tenant-scoped proxy that validates short-lived sessions and forwards only to private in-namespace agent services.
  • Implemented management access via a Tailscale subnet router with Just-In-Time access, group-based approvals, audit logging, and Kubernetes Network Policies — secure access to private GKE services under a deny-by-default posture.
Rust TypeScript OpenClaw Hermes Honcho Vault Terraform Tailscale Cloudflare GCP
May 2026
  • Architected a serverless stem-splitter on Cloudflare Workers, offloading GPU separation to Modal L4 instances via HMAC-signed dispatch — SPA and /api from one origin, no servers to maintain.
  • Trained PyTorch audio-separation models for instrument extraction using the Music Source Separation Training (MSST) framework on RunPod GPUs.
  • Instrumented real-time inference progress without a queue or websocket by class-patching tqdm inside the separation loop, throttling callbacks to 1 Hz, and signing them into Cloudflare Workers KV.
TypeScript Python PyTorch MSST Hono Vite React Modal RunPod Cloudflare Terraform

Tools are commodities. The discipline of how you operate them is the real artifact.

Orchestration & Platform

Infrastructure & Cloud

Observability & Incident

Languages & Data

01

Distributed systems reliability

Failure modelling, consensus, partition behaviour, dependency contracts.

02

Kubernetes platform engineering

Multi-tenant clusters, operator design, golden paths.

03

Multi-cloud infrastructure

AWS, GCP, Azure, Cloudflare, Railway, Modal, RunPod

04

High-availability architecture

Cell-based designs, active-active topologies, traffic shifting.

05

Observability engineering

SLI/SLO design, OpenTelemetry pipelines, sampling economics.

06

Resilience & disaster recovery

RPO/RTO architecture, chaos engineering programs, regional failover rehearsals.

07

Infrastructure automation

IaC at scale, drift detection, policy-as-code.

08

Platform scalability

Capacity modelling, performance work in Go and Rust, autoscaling control loops.

06 · Get in touch

For work, write to hello@guigo2k.com .