Case Study · 2024

Weekend Migration Under Pressure: Scaling a Live Commerce Platform

When thousands of concurrent viewers threatened to crash the platform, I executed a rapid weekend database migration and delivered 350% performance improvements.

Client

Tilt

Date

2024-05

Service

Database Optimization, AWS Infrastructure, Performance Engineering

Key Results

  • Executed a rapid weekend migration from Postgres to AWS RDS Aurora to handle thousands of concurrent stream viewers.
  • Reduced database index inefficiencies by ~350% through query optimization and restructuring.
  • Increased Docker build times by 55% through multi-stage build optimization and layer caching.
  • Built load-testing infrastructure that identified critical bottlenecks before they hit production.

The Challenge

Tilt, a live commerce platform, was preparing for a high-stakes seller live stream. The problem: their database couldn’t handle the expected traffic spike from thousands of concurrent viewers all hitting the platform simultaneously.

Load testing revealed critical bottlenecks. The existing Postgres setup was buckling under simulated load. With the live stream date approaching fast, they needed someone who could diagnose the issues and execute a fix—fast.

The clock was ticking. A failed live stream meant lost revenue, damaged seller relationships, and a reputation hit they couldn’t afford.

What I Did

Load Testing & Diagnosis

  • Built a comprehensive load-testing setup to simulate the expected viewer spike
  • Identified critical database bottlenecks that would have caused cascading failures
  • Mapped out the exact query patterns causing the most strain
  • Prioritized fixes by impact and urgency

The Weekend Migration

When it became clear the existing setup couldn’t scale, I executed a rapid weekend migration:

  • Migrated from Postgres to AWS RDS Aurora for better read scaling and failover
  • Managed the entire infrastructure change with Terraform for reproducibility
  • Coordinated the switchover to minimize downtime during off-peak hours
  • Validated the migration with load tests before going live

Performance Optimization

Beyond the migration, I tackled the underlying performance issues:

  • Reduced database index inefficiencies by ~350% through query analysis and index restructuring
  • Rewrote the most expensive queries to use proper indexing strategies
  • Implemented connection pooling and query caching where appropriate

CI/CD Improvements

While in the codebase, I also fixed their slow deployment pipeline:

  • Increased Docker build times by 55% by optimizing multi-stage builds
  • Implemented proper layer caching in GitLab CI
  • Reduced deploy friction so the team could ship faster

Frontend & Backend Work

Beyond the infrastructure crisis, I contributed to ongoing development:

  • Led the front-end refactor of the user onboarding flow using Vue 3 and Tailwind CSS
  • Developed NestJS microservices deployed on AWS Fargate and Lambda
  • Built new features using GraphQL via Hasura v2

The Tech Stack

  • AWS RDS Aurora (Postgres-compatible)
  • Terraform (IaC)
  • NestJS + GraphQL (Hasura v2)
  • Vue 3 + Tailwind CSS
  • AWS Fargate + Lambda
  • GitLab CI/CD
  • Docker (multi-stage builds)

The Results

  • 350% — Database Index Efficiency Improvement
  • 55% — Faster CI Build Times
  • 1 weekend — Full Database Migration
  • Thousands — Concurrent Viewers Supported

The live stream went off without a hitch. The platform handled the traffic spike, sellers made their sales, and the engineering team had a database architecture that could scale for future events.

More importantly, the performance work and CI improvements continued to pay dividends long after the immediate crisis was resolved.

More Case Studies

View all
Start Your Project

Ready to scale your startup?

Let's discuss how we can help transform your MVP into enterprise-grade infrastructure.