Universal Scaling Law

Amdahl’s law says adding processors helps less and less. Gunther noticed something worse: sometimes adding processors makes things slower. The extra nodes spend so much time synchronizing state with each other — invalidating caches, acquiring locks, exchanging messages — that throughput actually drops.

The universal scaling law adds a quadratic coherency penalty to Amdahl’s model. It explains the peak: the point where your system is doing its best, and every node you add after that makes it worse.

formula

\[C(N) = \frac{N}{1 + \sigma(N - 1) + \kappa N(N - 1)}\]

where:

\(C(N)\) — relative capacity (throughput normalized so that \(C(1) = 1\))
\(N\) — number of processors, threads, or load generators
\(\sigma\) — contention coefficient (\(0 \leq \sigma < 1\)). The fraction of time spent waiting for shared resources. When \(\kappa = 0\), this reduces to Amdahl’s law.
\(\kappa\) — coherency coefficient (\(\kappa \geq 0\)). The cost of maintaining data consistency across processors. This is the term that causes retrograde throughput.

Peak scalability occurs at:

\[N_{\max} = \sqrt{\frac{1 - \sigma}{\kappa}}\]

Beyond \(N_{\max}\), throughput decreases. Adding resources makes things worse.

Special cases:

\(\kappa = 0\): reduces to Amdahl’s law (monotonically increasing, asymptotically bounded)
\(\sigma = 0, \kappa = 0\): linear (ideal) scalability, \(C(N) = N\)

notes

USL is a superset of Amdahl. The \(\kappa\) term is what makes it more general. Amdahl’s law can never model retrograde throughput — the phenomenon where adding resources makes performance worse. USL can.

Retrograde is a diagnostic signal. If your measured throughput goes down with increasing load, coherency costs dominate. That tells you where to look: cache invalidation, lock contention with mutual dependency, cross-node state synchronization.

Not a curve-fitting exercise. Gunther emphasizes that the point isn’t making the USL curve pass through every data point. The point is extracting \(\sigma\) and \(\kappa\) to understand the nature of the bottleneck — contention vs. coherency.

Notation varies. Some sources use \(\alpha/\beta\) instead of \(\sigma/\kappa\). Same parameters: \(\sigma = \alpha\) (contention), \(\kappa = \beta\) (coherency).

Needs steady-state data. Collect data points under stable conditions with sufficient warm-up. At least six data points at different load levels for a meaningful regression.