Jubin Soni - Portfolio & Blog

In the early stages of a startup, the mantra is "growth at all costs." Engineering teams prioritize velocity, shipping features to find market fit while treating cloud infrastructure as an infinite, albeit expensive, resource. However, as a system matures to the scale of Uber, Netflix, or Stripe, the infrastructure bill transitions from an operational overhead to a primary architectural constraint.

FinOps—the practice of bringing financial accountability to the variable spend model of cloud—has traditionally been the domain of finance teams and cloud economists. But for a Staff Engineer, cost is a first-class architectural metric, much like latency or availability. A truly distributed system must be "Cost-Aware," capable of making real-time trade-offs between performance and expenditure. This post explores the design of a Cost-Aware Request Router, a system that dynamically shifts workloads based on spot instance availability, regional egress pricing, and tenant-level budgets.

Requirements

To design a production-grade cost-aware system, we must define the boundaries where financial logic meets technical execution. We aren't just looking for a dashboard; we are building a control plane that influences request lifecycle decisions.

Capacity Estimation

At scale, the router must handle millions of requests per second (RPS) without becoming a bottleneck.

Metric	Estimated Value
Total Daily Requests	2 Billion
Average RPS	23,000
Peak RPS	100,000
Routing Metadata Size	1 KB per request
Cost Engine Latency Budget	< 5ms
Data Ingestion (Telemetry)	500 GB / Day

High-Level Architecture

The system follows a "Control Plane / Data Plane" split. The Data Plane (implemented as a sidecar or an edge gateway) intercepts traffic, while the Control Plane (the Cost Engine) provides the intelligence on where that traffic should go based on current cloud pricing and budget health.

Detailed Design: The Cost-Weighted Routing Algorithm

The core of this system is a weighted routing algorithm that factors in the "Cost Score" of a target destination. Unlike standard round-robin or least-conn algorithms, our router calculates a TargetWeight based on:

Current Instance Price: On-demand vs. Spot.
Egress Penalty: The cost of moving data from the router to the destination.
Budget Remaining: If a tenant is nearing their monthly spend, they are routed to lower-cost tiers.

Here is a production-grade implementation of the decision logic in Go:

type Destination struct {
    ID             string
    Region         string
    BaseCost       float64
    IsSpot         bool
    LatencyScore   float64 // Normalized 0-1
}

type CostEngine struct {
    mu             sync.RWMutex
    SpotMultiplier float64
}

func (ce *CostEngine) CalculateWeight(dest Destination, tenantBudgetRemaining float64) float64 {
    ce.mu.RLock()
    defer ce.mu.RUnlock()

    // Base weight starts with performance
    weight := dest.LatencyScore * 100

    // Apply financial pressure: If budget is low, prioritize cheap regions
    if tenantBudgetRemaining < 1000.0 {
        if dest.IsSpot {
            weight *= 2.0 // Prefer spot instances
        } else {
            weight *= 0.5 // Penalize on-demand
        }
    }

    // Factor in regional cost differences
    // E.g., us-east-1 is often cheaper than af-south-1
    costPenalty := 1.0 / (dest.BaseCost + 1.0)
    return weight * costPenalty
}

This logic allows the system to be elastic. During peak hours where spot instances are reclaimed by the provider (e.g., AWS), the SpotTracker updates the IsSpot status, and the router seamlessly shifts traffic back to stable, albeit more expensive, on-demand clusters.

Database Schema

We require a schema that supports high-frequency updates for spot prices and budget increments. We use a combination of PostgreSQL for relational budget data and a time-series store for cost metrics.

To optimize performance, the BUDGET_QUOTA table should be partitioned by tenant_id using hash partitioning. Since budget checks occur on every request, we cache the current_spend in Redis, using an asynchronous write-behind pattern to update the primary SQL database.

Scaling Strategy

Scaling a cost-aware system introduces a paradox: the system itself must be cost-efficient. We utilize a multi-tiered caching strategy to ensure the router doesn't become the most expensive part of the stack.

As we scale from 1,000 to 1,000,000 users:

1K Users: Single instance of the Cost Engine; direct DB queries.
100K Users: Introduction of Redis for budget tracking; sidecar-based routing.
1M+ Users: Regional Cost Engines with localized state; Kafka for global budget synchronization; Edge-based routing using Cloudflare Workers or AWS Lambda@Edge to reduce latency.

Failure Modes and Resilience

In a FinOps scenario, a failure in the cost engine should never bring down the business logic. We follow the "Fail-Open" principle. If the cost engine is unavailable or exceeds its latency budget, the router defaults to a "Latency-First" mode, routing to the geographically closest healthy region regardless of cost.

Conclusion

Designing for FinOps requires a shift in the CAP theorem trade-offs. In a cost-aware architecture, we often favor Availability and Partition Tolerance over Consistency. It is acceptable if a tenant’s budget is slightly exceeded due to eventual consistency in the billing pipeline, provided the user experience remains uninterrupted.

Key patterns to remember:

Decouple the Decision from the Execution: The Data Plane should be "dumb" and fast, while the Control Plane handles the complex financial math.
Asynchronous Accounting: Never block a request to update a budget. Use atomic counters in distributed caches and sync to the source of truth later.
Spot-First Mentality: Design your workloads to be interruptible. This allows the router to aggressively use spot instances, which can reduce compute costs by up to 90%.

By treating cost as an engineering challenge rather than an accounting task, we build systems that are not only technically robust but also economically sustainable.

Cost-Aware Architecture (FinOps Scenario)