Jubin Soni - Portfolio & Blog

In the early days of microservices, the industry leaned heavily on synchronous REST APIs. However, as organizations like Uber and Netflix scaled to millions of concurrent users, they hit the "Distributed Monolith" wall. In a synchronous system, if Service A calls Service B, which calls Service C, a single millisecond of latency or a minor failure in Service C cascades upstream, potentially bringing down the entire platform. This tight coupling creates a fragile ecosystem where the blast radius of any single failure is unacceptably large.

To solve this, industry leaders shifted toward Event-Driven Architecture (EDA). In this paradigm, services do not command each other to perform actions. Instead, they emit "facts" about what has happened (e.g., OrderCreated, PaymentAuthorized). Other services subscribe to these facts and react accordingly. This inversion of control decouples the availability of services, allowing Netflix to stream video even if the "viewing history" service is momentarily lagging, or Uber to process a ride request even if the "promotion engine" is down.

Requirements

Designing an event-driven system for a high-scale environment like a ride-sharing platform requires balancing strict data consistency with extreme availability.

To support a global scale, we must estimate the capacity for a system handling millions of rides daily.

Metric	Daily Volume	Peak Throughput
Active Users	10 Million	1 Million (Concurrent)
Rides Created	2 Million	5,000 TPS
Location Updates	5 Billion	200,000 TPS
Event Storage	10 TB	N/A

High-Level Architecture

The core of an Uber-style event-driven system is the Event Bus (typically Apache Kafka or Pulsar). Services interact through topics, ensuring that the producer of an event has no knowledge of its consumers.

In this architecture, when a user requests a ride, the Order Service persists the request to its local database and emits a RideRequested event to Kafka. The Driver Matching Service consumes this event to find nearby drivers, while the Analytics Service consumes it to update demand heatmaps. Neither service blocks the other.

Detailed Design: The Idempotent Consumer

A major challenge in event-driven systems is "at-least-once" delivery. Kafka guarantees that a message will be delivered, but network retries can result in the same message being processed twice. To prevent a user from being charged twice, we must implement idempotent consumers.

The following Go snippet demonstrates a production-grade idempotent event handler using a relational database to track processed message IDs.

func HandlePaymentEvent(db *sql.DB, event PaymentEvent) error {
    // Start a transaction to ensure atomicity
    tx, err := db.Begin()
    if err != nil {
        return err
    }
    defer tx.Rollback()

    // 1. Check if the event has already been processed (Idempotency Key)
    var exists bool
    query := `SELECT EXISTS(SELECT 1 FROM processed_events WHERE event_id = $1)`
    err = tx.QueryRow(query, event.ID).Scan(&exists)
    if err != nil {
        return err
    }

    if exists {
        log.Printf("Event %s already processed, skipping", event.ID)
        return nil
    }

    // 2. Perform the business logic
    updateQuery := `UPDATE accounts SET balance = balance - $1 WHERE user_id = $2`
    _, err = tx.Exec(updateQuery, event.Amount, event.UserID)
    if err != nil {
        return err
    }

    // 3. Record the event ID to prevent reprocessing
    insertLog := `INSERT INTO processed_events (event_id, processed_at) VALUES ($1, NOW())`
    _, err = tx.Exec(insertLog, event.ID)
    if err != nil {
        return err
    }

    return tx.Commit()
}

Database Schema

In a distributed system, we often use a "Database per Service" pattern. For the Order Service, we need a schema that supports both the current state and the outbox pattern to ensure events are published reliably.

To handle massive scale, the ORDERS table should be partitioned by user_id or created_at. In PostgreSQL, we can use declarative partitioning:

sql

CREATE TABLE orders (
    id UUID NOT NULL,
    user_id UUID NOT NULL,
    status VARCHAR(20),
    created_at TIMESTAMP NOT NULL,
    PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);

CREATE INDEX idx_orders_user_id ON orders(user_id);

Scaling Strategy

Scaling from 1,000 to 1,000,000 users requires moving from a single message queue to a partitioned log architecture. Kafka partitions are the unit of parallelism.

As load increases, we increase the number of partitions and consumers. Because events for a specific user_id always go to the same partition, we maintain strict ordering for that user while processing different users in parallel.

Failure Modes and Resilience

In an event-driven world, failures are not a matter of "if" but "when." We must design for partial failures using Circuit Breakers and Dead Letter Queues (DLQ).

If the Payment Service is down, the RideRequested events will simply buffer in Kafka. Once the service recovers, it will resume processing from the last committed offset. This "temporal decoupling" is the greatest strength of EDA.

Conclusion

Building event-driven microservices at the scale of Uber or Netflix requires a fundamental shift in how we think about state and consistency. We trade the simplicity of ACID transactions for the scalability of eventual consistency and the resilience of asynchronous streams.

Key takeaways for production-grade design:

Embrace Idempotency: Every consumer must be able to handle the same message multiple times without side effects.
The Outbox Pattern: Never update a database and send a message in two separate, non-atomic steps. Use a local outbox table to ensure "exactly-once" publishing.
Partitioning is King: Use meaningful keys (like user_id or ride_id) to ensure related events are processed in order while allowing horizontal scaling.
Observability: In a decoupled system, tracing a request requires a correlation ID passed through every event header to reconstruct the flow in tools like Jaeger or Honeycomb.

While EDA introduces complexity in testing and debugging, it provides the only viable path to building truly global, fault-tolerant systems that can survive the failure of entire data centers.

Event-Driven Microservices (Uber / Netflix Style)