user@techdebt:~/blog$_
$ cd ..

The Balancer

mode:
$ balancer inspector
click Step or Play to inspect balancer state
$ simulation.log

the problem

One server handles 1,000 requests per second. You need to handle 10,000. You could buy a bigger server (vertical scaling), but there is a ceiling. At some point, you need multiple servers.

Now you have a new problem: which server handles which request? If you send everything to server 1, you have not scaled at all. If you split traffic unevenly, some servers are overloaded while others sit idle. A load balancer sits in front of your servers and makes this decision for every incoming request.

round-robin

The simplest algorithm. Requests go to servers in order: S1, S2, S3, S4, S1, S2, S3, S4, and so on. No state to track, no computation, perfect distribution assuming all servers are identical and all requests cost the same.

The problem: requests are not equal. A lightweight health check and a heavy database query both count as one request. Round-robin sends the heavy query to S3 even if S3 is already handling three heavy queries while S1 is idle. It distributes requests evenly but not load.

Round-robin works well when requests have similar cost and servers have similar capacity. For homogeneous, stateless API servers behind a gateway, it is often good enough.

least connections

Instead of rotating blindly, track how many active connections each server has. Send the next request to the server with the fewest. This adapts to reality: if S1 finishes fast, it gets more traffic. If S3 is stuck on a slow query, it gets less.

Least-connections requires the balancer to track connection state, which adds a small overhead. But the improvement in load distribution is significant for workloads with variable request latency.

Most production load balancers default to least-connections or a variant of it. Nginx uses least_conn, HAProxy uses leastconn, and AWS ALB uses a similar algorithm internally.

weighted routing

Not all servers are equal. A 16-core machine can handle more than a 4-core machine. Weighted routing assigns a weight to each server and distributes traffic proportionally.

With weights S1=1, S2=2, S3=3, S4=1, a batch of 7 requests splits: S1 gets 1, S2 gets 2, S3 gets 3, S4 gets 1. The beefy server handles more traffic because it can.

Weights are also useful for canary deployments. Set the canary server’s weight to 1 while production servers have weight 100. The canary gets roughly 1% of traffic. If it performs well, gradually increase its weight.

Setting a weight to 0 removes a server from rotation without marking it unhealthy. Useful for graceful draining during deployments.

health checks

A load balancer is only as good as its health checks. Sending traffic to a dead server means failed requests for users.

Active health checks: The balancer periodically pings each server (HTTP GET /health, TCP connect). If a server fails N consecutive checks, it is removed from the pool. When it passes again, it is added back.

Passive health checks: The balancer monitors actual traffic. If a server returns too many 5xx errors or times out too often, it is marked unhealthy. No extra traffic needed, but slower to detect failures.

Most production setups use both. Active checks catch servers that are completely down. Passive checks catch servers that are up but misbehaving.

L4 vs L7

Load balancers operate at different layers of the network stack.

L4 (transport layer): Routes based on IP addresses and TCP/UDP ports. Fast because it does not inspect the request payload. Cannot make routing decisions based on HTTP headers, URLs, or cookies.

L7 (application layer): Routes based on HTTP headers, URL paths, cookies, and request content. Can do path-based routing (/api goes to API servers, /static goes to CDN), header-based routing (mobile vs desktop), and cookie-based sticky sessions.

L4 is faster. L7 is smarter. Most modern load balancers (Nginx, Envoy, AWS ALB) operate at L7 by default.

where it shows up

  • Nginx: The most widely deployed reverse proxy and load balancer. Supports round-robin, least-connections, IP hash. Runs as L7 by default, can do L4 with stream module.
  • HAProxy: High-performance TCP/HTTP load balancer. Known for reliability and battle-tested configurations. Powers many high-traffic sites.
  • AWS ALB/NLB: ALB is L7 (HTTP routing, path-based, host-based). NLB is L4 (TCP/UDP, ultra-low latency). ELB Classic is the older combined version.
  • Envoy: Modern L7 proxy designed for microservice architectures. Used as the data plane in service meshes (Istio). Supports advanced features like circuit breaking, retries, and observability.