What is the minimum energy to compute?

The Computation Floor

Speed vs energy — prime bounce and reversible computing approach the same Landauer limit

JIM’S OVERSIMPLIFICATION

Every time a computer erases a bit, it pays a tax to the universe in heat. One bit costs kT ln 2 joules. You cannot avoid this. This is why your laptop gets hot. The universe charges for forgetting.

K IN THIS DOMAIN

K here is Landauer's limit. Each bit erasure costs kT ln 2 of coupling to the heat bath. Computation IS managed decoupling.

Your computer is a space heater that accidentally does math. This is not a joke. The M4 chip in a Mac Mini uses 35 watts to compute at 3.7 trillion operations per second. That means each operation costs about 60 picojoules. The theoretical minimum — the absolute floor set by physics — is 0.003 femtojoules. Current silicon is 21 million times above the floor.

Why? Because every time a computer erases a bit, the universe charges a tax in heat. One bit erased costs kT ln(2) joules. At room temperature, that is a breathtakingly tiny number — about 3 × 10^-21 joules. But do it trillions of times per second and you get 35 watts of space heater.

There are two ways to approach the floor. Go faster or go cheaper.

Go faster (prime bounce): Pack more useful computation into the same energy budget. By dispatching GPU work at prime-numbered intervals — avoiding pipeline collisions with the hardware’s natural rhythm — you get 9.12x more useful work per joule. Same power consumption. Nine times the output. Effective cost per useful operation drops from 60 pJ to 6.6 pJ.

Go cheaper (reversible computing): Run the computation forward, get your answer, then run it backward to undo the scratch work. Zero erasure, zero tax. Theoretically perfect. In practice, you trade speed for energy. The adiabatic sweet spot on a modeled 7nm chip: 408x below current dissipation. Still 408x above the floor. But 408x is a lot of savings.

Combine both? 9.12 times 408 = 3,721x improvement. And you are still 5,600x above Landauer. The floor is VERY far down.

The honest kicker: the combined number assumes both gains are independent. They might not compose linearly. We have not built a reversible prime-bounce circuit. This is a design idea, not a result. But the two roads both point toward the same floor — and that convergence is physics, not speculation.

TWO DIRECTIONS, ONE FLOOR

PRIME BOUNCE (Shape Computing):
  Maximize useful computation per unit TIME
  Prime-spaced dispatch: 2, 3, 5, 7 × 2 × 3
  Result: 9.12x throughput gain (428K → 3.9M/sec)
  Direction: pushing computation UP toward the speed ceiling

REVERSIBLE COMPUTING (Energy Floor):
  Minimize ENERGY per useful computation
  Adiabatic sweet spot at 840 ps: 408x energy reduction
  Direction: pushing energy DOWN toward the Landauer floor

THE FLOOR:
  kT ln(2) per bit erased = 2.85 × 10⁻²¹ J at 300K
  This is the irreducible cost of destroying one bit of information.
  Landauer (1961). Experimentally verified (Bérut et al. 2012).

THE CONVERGENCE

Both approaches are navigating the same tradeoff space. Prime bounce says: given that each bit erasure costs at least kT ln(2), pack as many USEFUL erasures as possible into each clock cycle. Reversible computing says: given a fixed computation, minimize the number of bit erasures to approach the Landauer floor.

Current M4 position:
  Energy per operation: ~60 pJ (measured)
  Landauer floor: 0.003 fJ per bit
  Gap: 21,000,000x above floor

Prime bounce contribution:
  9.12x more useful work per joule (same energy, more throughput)
  Effective: 6.6 pJ per useful operation

Reversible contribution:
  408x less energy per operation (adiabatic sweet spot)
  Effective: 0.15 pJ per operation

Combined (theoretical):
  9.12 × 408 = 3,721x improvement
  Still 5,600x above Landauer. The floor is VERY far down.

THE CROSS-PREDICTION

Could prime bounce dispatch improve the efficiency of reversible circuits?

The idea: Reversible circuits waste energy on idle cycles (charge leakage). Prime bounce computes on prime-indexed cycles and prefetches on composite cycles. If you ran a reversible circuit on a prime bounce schedule, the composite cycles could be used for adiabatic charge recovery instead of idle leakage. Fewer wasted energy cycles = closer to Landauer.

This is a design idea, not a result. We haven't built it. But the math suggests the trampoline pattern (compute → recover → compute) maps naturally onto adiabatic switching (charge → discharge → charge).

HONEST LIMITS

The connection is conceptual.
We have not built a reversible prime-bounce circuit.
The 408x is from a model, not hardware.
The 9.12x is measured on M4 GPU, not on reversible hardware.
The 3,721x combined figure assumes both gains are independent —
they may not compose linearly in practice.

What's real: both approaches converge on kT ln(2).
That convergence is physics, not speculation.

GUMP — Research · Support · [email protected] · terms