Paste train and test losses. Detect grokking — the phase transition from memorization to understanding. 224,000× cheaper than memorization, measured in Landauer bits.
LEARNING — both losses high, model hasn’t started. MEMORIZED — train loss low but test still high, gap widening. GROKKING — test loss suddenly drops, gap closing. The phase transition. UNDERSTOOD — both low, gap stable. Understanding achieved.
Your data stays on your machine.