← Research

Why Do Proteins Misfold?

Six diseases. One equation. One strategy. 4,500+ mutations screened on a $599 computer.
JIM’S OVERSIMPLIFICATION

Proteins are origami. Disease happens when the origami gets sticky and clumps, or when it is too floppy and wanders. The fix is almost comically simple: sticky things need electric charge to repel each other. Floppy things need grip to fold. Six diseases. Two rules. Same physics.

THE PATTERN

Every protein misfolding disease follows one of two failure modes. Either the protein has an oily patch that makes copies stick to each other (aggregation), or it has no structure at all and drifts into toxic clumps (disorder). The body uses one error pattern. The fix is one pattern too.

We screened over 4,500 mutations across six disease proteins. The answer is the same every time. Too sticky? Add charge. Charged residues repel like same-pole magnets. Too floppy? Add anchors. Hydrophobic residues create a structural core from nothing. The engine reads the protein and prescribes the opposite of what is wrong.

THE TWO RULES

Hydrophobic core EXPOSED → ADD CHARGE → repel aggregation
Hydrophobic core MISSING → ADD ANCHORS → create fold

Same variable. Opposite direction. Six diseases.
DiseaseProteinProblemStrategyBest HitEffect
Alzheimer’sAβ42KLVFF stickyChargeV18D↓28% agg
Parkinson’sα-synNAC oilyChargeV70D↓22% agg
Type 2 diabetesIAPPNFGAIL stickyChargeL16K↓50% agg
Alzheimer’s (tau)TauPHF6 stickyChargeI278DHotspot gone
ALS (FUS)FUS LCD98% floppyAnchorsT11V+T71VCore formed
ALS (TDP-43)TDP-43Half/halfBothI151D↓11% agg

A vertex decouples. Edges weaken. The shape loses coherence. This is aggregation.


1. Alzheimer’s Disease — Amyloid-beta 42

42 amino acids. KLVFF motif. The sticky origami.

Aβ42 has a stretch in the middle — five amino acids called KLVFF — that is extremely sticky. Oily. Hydrophobic. It finds other copies and sticks to them, face to face, until you have a plaque that kills neurons.

We screened 798 mutations in 1.4 seconds. The top 10 mutations that reduce stickiness? All add electric charge. Zero add more oil. Ten out of ten. A drug in Phase 3 trials — tramiprosate (ALZ-801) — does exactly this: carries a negative charge to the aggregation surface. We found the same answer from the math. Nobody told the engine about tramiprosate.

The Arctic mutation (E22G) proves it in reverse: removing charge near KLVFF causes Alzheimer’s in the 50s instead of the 70s. Our answer is literally the inverse of a known disease-causing mutation.

BEFORE (wild type): Aggregation: 18/42 (43%) | Helix: 31%
AFTER (V18D): Aggregation: 13/42 (31%) ↓28% | Helix: 45%

Controls:
  V18D (charged): ↓28% | V18I (hydrophobic): 0% change
  V18E (charged): ↓28% | V18L (hydrophobic): 0% change

External match: Tramiprosate (ALZ-801, Phase 3) — same target, same strategy.
Arctic mutation E22G REMOVES charge → early-onset Alzheimer’s. We found the INVERSE.
Sequence: DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA (42 residues)
798 mutations screened in 1.4 seconds. 586 analyses/sec on Mac Mini M4.

2. Parkinson’s Disease — Alpha-Synuclein

140 amino acids. NAC region. The fix nobody has tried.

Alpha-synuclein has a greasy stretch (NAC region, residues 61-95) that clumps into Lewy bodies, killing dopamine neurons. The wild-type protein is already borderline — low risk, net charge -8.9. The famous A53T familial mutation barely moves the numbers. This protein is one nudge from disaster.

V70D — charge in the heart of the NAC core — drops aggregation 22%. 7 out of 10 top stabilizing mutations add charge. Zero add hydrophobic. No existing drug targets this position. Levodopa replaces lost dopamine but does nothing about the clumping.

V70D: Aggregation 32/140 → 25/140 ↓22%
  Sheet: 20% → 16% | Charge: -8.9 → -9.9

Controls:
  V70D (charged): ↓22% | V70I (hydrophobic): 0% change

New finding: No existing drug targets V70 of the NAC core.
152 mutations across the NAC aggregation core (residues 61-95).

3. Type 2 Diabetes — IAPP Amyloid

37 amino acids. NFGAIL motif. Helix from nothing.

IAPP clumps in the pancreas, killing beta cells. 90% of Type 2 diabetes patients have amyloid deposits at autopsy. L16K — positive charge at position 16 — cuts aggregation 50% and spontaneously creates 27% helix from zero. The charge does not just prevent clumping — it tells the protein how to fold.

L16K: Aggregation 12/37 → 6/37 ↓50%
  Helix: 0% → 27% (created from nothing)
  Sheet: 35.1% → 16.2%

Controls:
  L16K (charged): ↓50% | L16I (hydrophobic): 0% change

Gap: Pramlintide REPLACES IAPP but doesn’t stop the aggregation.
703 mutations screened in 1.0 seconds. 9/10 top stabilizing mutations add charge.

4. Tau Tangles — The Other Half of Alzheimer’s

441 amino acids. PHF6 motif. One change eliminates the hotspot.

Tau is the brain’s rebar. It holds microtubules together. When tau detaches and clumps (PHF6: VQIVYK at positions 274-280), the scaffolding collapses. Tau tangles correlate more strongly with cognitive decline than amyloid plaques do.

I278D — one charged amino acid in the middle of VQIVYK — eliminates the entire 7-residue aggregation hotspot. The single largest sticky stretch in the whole 441-amino-acid protein, gone with one change.

Wild type: 7 aggregation regions, PHF6 (274-280) = largest hotspot
I278D: 7 → 6 regions. PHF6 hotspot: ELIMINATED

P301L (famous FTD mutation):
  Hydrophobicity: 0.282 → 0.293 (increased — directionally correct)
  Prolines: 42 → 41 (removes helix breaker in repeat domain)
  Sequence-level tool sees it directionally but not dramatically. Honest limit.
Tau 2N4R: 410 residues processed. 73% coil (IDP). No hydrophobic core. Charge at PHF6 = 4/4 amyloid proteins, same strategy.

5. ALS — FUS Prion Domain

163 amino acids. 98% coil. The opposite problem, the opposite fix.

FUS is the opposite of every other disease on this page. It has no structure at all. 98% floppy noodle. No skeleton. No core. Without structure, it drifts into clumps that kill motor neurons. There is no FDA-approved drug.

For the sticky proteins, charge worked. For FUS, charge does nothing (explicitly tested). The answer is reversed: add hydrophobic anchors to give the floppy chain a reason to fold. 9 out of 10 top stabilizing mutations add hydrophobic residues. Two anchors (T11V + T71V) are the minimum effective dose — they create a structural core that never existed, while preserving the flexibility FUS needs for RNA processing. A brace, not a cast.

2 anchors (T11V + T71V) — MINIMUM EFFECTIVE DOSE:
  Hydrophobic core: FALSE → TRUE (core FORMED)
  Burial: -0.072 → -0.004 (approaching positive)
  Coil: 98.2% → 93.9% | Sheet: 1.8% → 6.1%
  Rg: 25.17Å → 24.37Å (3% compaction)

5 anchors (full dose):
  Rg: 25.2 → 23.4 (42% anchor compaction)
  Burial: -0.072 → +0.235 (core fully formed)

Control: T11D + T71D (charged) → core = False. Charge does NOT work for FUS.

External match: Tafamidis (FDA 2019) stabilizes TTR by same principle.
627 mutations screened in 8.7 seconds. Engine-designed crosslinker: ~400-600 Da bivalent molecule spanning T11-T71 (~24Å). A pharmacological brace.

6. ALS — TDP-43

414 amino acids. Found in 97% of ALS patients. Two problems in one protein.

TDP-43 is half structured, half disordered. The structured half (RRM domains) breaks the Alzheimer’s way — charge at I151 reduces aggregation 11%. The floppy tail (LCD) breaks the FUS way — it probably needs hydrophobic anchors. TDP-43 is a protein with an identity crisis. It needs both strategies simultaneously.

I151D (structured RRM region):
  Aggregation: 56/322 → 50/322 ↓11%
  Helix: 48.1% → 53.1%

Charge universality:
  454 stabilizing charge mutations across 322 residues
  2 destabilizing (both REMOVE existing charge: K264D, K264E)

THE HYBRID:
  Structured RRMs → CHARGE (Alzheimer’s strategy)
  Disordered LCD → likely ANCHORS (FUS strategy)
  One drug for the core. One brace for the tail.
Honest limit: Analyzed 322/414 residues. Most ALS mutations (Q331K, M337V) are in the missing 92 residues. This is stated openly because it matters. 1,288 charge mutations scanned in 29.7 seconds.

THE CHARGE STRATEGY

Wherever a hydrophobic surface drives pathological aggregation, charge disrupts it. The engine finds the exact position. The physics does not change.

The mechanism (same across all 4 charge-responsive proteins):

  1. Electrostatic repulsion: charged monomers repel instead of stacking
  2. Increased solubility: charge keeps monomers dissolved
  3. Helix nucleation: charge favors α-helix over β-sheet (amyloid form)

The two rules, unified:
  Hydrophobic core EXISTS but is exposed → add charge (Alzheimer’s, Parkinson’s, diabetes, tau)
  Hydrophobic core MISSING entirely → add hydrophobic anchors (ALS-FUS)
  Both present → dual strategy (TDP-43)

The engine does not know which rule to apply in advance. It reads the protein and figures it out from the math. Every time.

HONEST LIMITS

What we cannot do:
  Molecular dynamics simulation (how aggregation happens in real time)
  Blood-brain barrier crossing prediction
  Clinical trial outcome prediction
  Multi-protein aggregation modeling (Lewy body formation)
  Hyperphosphorylation modeling (tau’s other pathology)
  Liquid-liquid phase separation (FUS, TDP-43)
  TDP-43: 92 residues unanalyzed where most ALS mutations cluster
  P301L detection is subtle at sequence level (structural dynamics missing)

What IS solid:
   4/4 amyloid proteins respond to charge at the aggregation core
   FUS responds to opposite strategy (hydrophobic anchors)
   TDP-43 hybrid nature confirmed
   Tramiprosate match found independently
   Arctic mutation inverse confirmed
   All controls pass (wrong strategy tested and confirmed weaker)

crystal_fold ships in pip v0.5.1. Aβ42 Rg: 12.28Å vs 12.1Å PDB (1.5% error).

COMPUTATION DETAILS

Hardware: Mac Mini M4, 10-core GPU, 16GB, $599, 35W
Engine: Fold Watch (gump.foldwatch) — spectral tension on amino acid interaction graph
Total mutations screened: 4,500+ across 6 proteins
Backtest: 23/25 proteins validated against PDB crystal structures (92%)
Software: pip install begump — open source, spectral math, not neural network

HOW TO REPRODUCE

pip install begump

from gump.foldwatch import analyze

# Alzheimer's — wild type vs V18D
wt = analyze("DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA")
v18d = analyze("DAEFRHDSGYEVHHQKLDFFAEDVGSNKGAIIGLMVGGVVIA")

# Diabetes — wild type vs L16K
wt = analyze("KCNTATCATQRLANFLVHSSNNFGAILSSTNVGSNTY")
l16k = analyze("KCNTATCATQRLANFKVHSSNNFGAILSSTNVGSNTY")

# Compare aggregation
wt_agg = sum(a['end']-a['start']+1 for a in wt['aggregation_regions'])
mut_agg = sum(a['end']-a['start']+1 for a in l16k['aggregation_regions'])
print(f"Aggregation: {wt_agg} → {mut_agg} residues")

This is computational research, not medical advice. The engine identifies molecular strategies from sequence analysis. Clinical validation requires wet-lab experiments and regulatory approval. Drug matches are independent computational findings, not clinical endorsements.

GUMPResearch · Support · [email protected] · terms