How do you know if a mutation is harmful?

Mutation Scanner

written 2026-04-11 · last edited 2026-06-02

Variant pathogenicity + GoF/LoF regime detection — 1,594 variants, 15 proteins, zero training

JIM’S OVERSIMPLIFICATION

Some amino acids are bridges holding two parts of a protein together. Mutate the bridge, the protein falls apart. But some mutations do not break anything — they flip switches. Knowing whether the mutation is breaking a wall or flipping a switch is the whole game. We built a detector that figures out which ruler to use before scoring.

WHAT THIS DOES

You have a genetic mutation. You want to know: does it matter? The scanner answers by asking two things. First: how structurally important is this position? (Fiedler damage — how much the protein's connectivity collapses when you pull one node.) Second: has evolution preserved this position? If every species from fish to humans has the same amino acid here, it is probably important.

But there is a deeper problem. Loss-of-function mutations follow one rule (structural damage predicts disease). Gain-of-function mutations follow the opposite rule (disease mutations hit control sites, not load-bearing walls). If you do not know which regime you are in, your scores are meaningless. The regime detector solves this: 8 out of 10 correct, zero training data.

HOW WELL IT WORKS

0.74 AUC across 6 disease proteins (leave-one-gene-out). Matches SIFT (2001). Does not match AlphaMissense (0.94 AUC, trained on 100M sequences). The value is not raw accuracy — every score is traceable to a physical mechanism. Fiedler network damage alone achieves 0.82 AUC. One number. No training. Pure graph theory.

THE NUMBERS

Regime-aware scorer:
Within-gene AUC: 0.74 (mean across 6 proteins, leave-one-gene-out)

Per-gene:
EGFR: 0.92 | RET: 0.85 | BRAF: 0.75 | p53: 0.66 | MTHFR: 0.63 | AR: 0.60

Novel feature:
Fiedler damage: 0.82 AUC as single feature under LOGO cross-validation

Speed:
Online: 30 variants/sec | Precomputed: 216,211 variants/sec

Previous claims (94.7%, 83.4%) were inflated by gene-level confounders and in-sample weight optimization, discovered and corrected on audit. The current 0.74 is validated under strict leave-one-gene-out with no learned weights.

WHAT IT CATCHES

Variant	Disease	Score	Mechanism
KRAS G12D	Lung/pancreatic cancer	0.545	GTPase P-loop disruption
p53 R175H	Cancer (#1 hotspot)	0.254	Metal site + charge loss
HBB E6V	Sickle cell	0.570	Surface hydrophobic patch
SOD1 A4V	ALS	0.180	Buried packing change

Benign variants correctly identified:

p53 P72R (AF=0.72): 0.000 — gnomAD filter
BRCA1 P871L (AF=0.36): 0.000 — gnomAD filter
HBB E6D (conservative): 0.000 — same charge, same size

VS THE FIELD

Tool	AUC	Training data
SIFT (2001)	0.69–0.74	Conservation
PolyPhen-2 (2010)	0.75–0.81	Conservation + structure
CADD v1.6 (2019)	0.82–0.87	Genome-wide meta-predictor
REVEL (2016)	0.90–0.94	ClinGen-calibrated ensemble
AlphaMissense (2023)	0.94–0.96	AlphaFold + 100M sequences
GUMP (2026)	0.74	Fiedler damage + MSA + physics

GoF/LoF REGIME DETECTOR

The two-ruler problem: a universal scorer that treats all genes the same will score GoF genes backwards. The detector identifies which ruler to use before scoring.

Loss-of-Function (LoF):
  Pathogenic mutations break the structural core.
  Damage correlates with pathogenicity.
  Examples: TP53, BRCA1, ATM, FBN1

Gain-of-Function (GoF):
  Pathogenic mutations hijack functional control sites.
  High-coupling, LOW-damage sites = pathogenic.
  Examples: BRAF, EGFR, AR, PIK3CA

Gene	Detected	Expected	Conf	K-Dmg Corr	n
TP53	LoF	LoF	0.48	+0.056	1,331
ATM	LoF	LoF	1.00	+0.165	32
FBN1	LoF	LoF	1.00	+0.011	34
EGFR	GoF	GoF	1.00	-0.392	23
AR	GoF	GoF	0.81	+0.053	57
PIK3CA	GoF	GoF	1.00	-0.913	6
BRAF	GoF	GoF	0.29	+0.046	8

Accuracy: 8/10 correct on known-regime genes

BRCA1 misclassified: LoF through surface disruption looks like GoF to a structural-only detector (conf 0.40). Genuine limit.

THE SIGNALS

Verdict (2 signals):
  1. K — contact degree at position (size-normalized)
  2. Conservation — BLOSUM62 ortholog alignment (61-157 species)
  Filter: gnomAD population frequency (AF ≥ 1% → benign)

GoF/LoF detection (3 signals):
  1. K-damage correlation (strongest, weight 3x)
  2. Control knob fraction (high-K, low-damage pathogenic sites)
  3. Damage ratio (pathogenic vs random position damage)

REPRODUCIBLE

pip install begump
from gump.foldwatch import profile_mutation, detect_regime

# Score a mutation
r = profile_mutation(KRAS, 12, 'G', 'D', 'KRAS')
print(r['verdict']) # PATHOGENIC
print(r['T_fold']) # 0.545

# Detect GoF vs LoF regime
result = detect_regime('TP53', variants, sequence=sequence)
print(result['regime']) # 'LoF' or 'GoF'

HONEST LIMITS

Irreducible from sequence alone:
  Gain-of-function detail | DNA-contact mutations | Tetramer destabilization | Epistasis

GoF/LoF limits:
  BRCA1 misclassified | Small variant counts (BRAF: 8, PIK3CA: 6)
  Mixed-regime genes (RET: both MEN2A and Hirschsprung)

What IS solid:
  ✓ Fiedler damage: 0.82 AUC, single feature, zero training
  ✓ GoF/LoF: 8/10 correct, pure physics + structure
  ✓ Regime detection explains the AUC jump from 0.587 to 0.74

Jim McCandless, beGump LLC. All computation on Mac Mini M4, 16GB, 35W. No cloud. Test variants, ortholog data, gnomAD index, and validation script included with the package.

GUMP — Research · Support · [email protected]