← Research

Dr. ADK

Coupling-based drug discovery — 41,120 compounds — +0.055 AUC — corrected
JIM’S OVERSIMPLIFICATION

We thought we found signal. We added coupling features to simplified molecular descriptors and got +0.055 AUC. Then someone pointed out our baseline was weak — no Morgan fingerprints, no hyperparameter optimization, 30-minute vibe code. They were right. We re-ran with ECFP4 2048-bit Morgan fingerprints, 21 RDKit descriptors, grid-searched hyperparameters, and the full 41,120-compound dataset. Delta: −0.003 (p=0.50). Not significant. Morgan fingerprints already capture the topological information our coupling features were providing. This page is the correction.

K IN THIS DOMAIN

K here is molecular coupling. How tightly bonded the atom graph is. High K = robust connectivity = harder to fragment = different binding behavior. The Fiedler value measures this directly: the algebraic connectivity of the molecular graph.

THE CORRECTION

A valid criticism noted our original comparison used simplified descriptors without Morgan fingerprints or hyperparameter optimization. We re-ran with a proper cheminformatics pipeline: ECFP4 2048-bit Morgan fingerprints + 21 RDKit physicochemical descriptors (2,069 total features), grid search over 36 hyperparameter combinations (same search applied to all models), and the full 41,120-compound dataset with no subsampling. K/R/E/T adds −0.003 AUC-ROC (p=0.50) and −0.012 AUC-PR (p=0.14). Neither metric significant. The original +0.055 was real but only because the baseline was artificially weak. Someone did our MM12P for free. They were right on every point.

ComparisonStandardCombinedDeltap-value
Original (weak baseline)0.753 (16 simplified, 4,943 cpds)0.808+0.055<0.001
Proper (Morgan+RDKit)0.816 (2,069 features, 41,120 cpds)0.812−0.0030.50
K/R/E/T only: 0.751 AUC-ROC — loses to standard by 0.065 (p=0.001)
Feature importance (proper model): Standard 98.1%, K/R/E/T 1.9%
K/R/E/T in top 20: 1/20 (K_wiener_norm at #5)
AUC-PR (imbalanced metric): Standard 0.476, Combined 0.464 — K/R/E/T adds nothing
Verdict: K/R/E/T does NOT add signal to a proper cheminformatics baseline

THE CHALLENGE

Given a molecule, predict whether it is active against HIV. Standard approach: compute molecular descriptors (weight, polarity, hydrogen bonds), feed them to a classifier. We asked: what happens if you also describe the molecule as a coupled system — measuring its connectivity structure, energy distribution, and internal tension?

THE DATASET

MoleculeNet HIV: 41,127 compounds from real screening data. 1,443 active (3.5%). Original analysis subsampled to 4,943 molecules. Corrected analysis uses the full 41,120 valid compounds (7 unparseable SMILES excluded) with no subsampling. Class imbalance handled via scale_pos_weight in XGBoost.

PROPER RESULTS (MORGAN + RDKIT + HYPERPARAMETER OPTIMIZATION)

ECFP4 2048-bit Morgan fingerprints + 21 RDKit physicochemical descriptors. Grid search: n_estimators {100, 300, 500} x max_depth {3, 5, 7, 9} x learning_rate {0.01, 0.05, 0.1} = 36 combinations, 3-fold CV. Same search for all models. Final evaluation: stratified 5-fold CV on full dataset.

ModelFeaturesAUC-ROC (95% CI)AUC-PR (95% CI)
A (Standard)Morgan+RDKit (2,069)0.816 [0.795, 0.836]0.476 [0.432, 0.520]
B (K/R/E/T only)K/R/E/T (26)0.751 [0.734, 0.768]0.234 [0.200, 0.267]
C (Combined)Standard+K/R/E/T (2,095)0.812 [0.787, 0.838]0.464 [0.418, 0.510]
Delta C − A (AUC-ROC): −0.003  95% CI: [−0.015, +0.009]
  t = −0.74  p = 0.50
  Per-fold: −0.003, −0.018, −0.002, −0.003, +0.010
  Positive folds: 1/5

Delta C − A (AUC-PR): −0.012  95% CI: [−0.030, +0.006]
  t = −1.81  p = 0.14

Delta B − A (AUC-ROC): −0.065  95% CI: [−0.086, −0.044]
  t = −8.48  p = 0.001
  K/R/E/T alone loses to Morgan by a large, statistically significant margin.

CROSS-VALIDATION DETAIL (PROPER BASELINE)

FoldA (Standard)B (K/R/E/T)C (Combined)Delta (C−A)
10.8410.7540.838−0.003
20.8110.7440.793−0.018
30.8100.7710.808−0.002
40.7960.7320.794−0.003
50.8200.7530.830+0.010
Mean0.8160.7510.812−0.003

FEATURE IMPORTANCE (PROPER MODEL)

Top 10 features from Model C trained on full dataset. K/R/E/T features account for 1.9% of total importance despite being 1.2% of feature count. Only K_wiener_norm appears in the top 20.

#FeatureImportanceSource
1morgan_16480.0051Standard
2morgan_15190.0051Standard
3morgan_5770.0048Standard
4morgan_12540.0047Standard
5K_wiener_norm0.0042K/R/E/T
6morgan_11850.0041Standard
7morgan_8000.0040Standard
8morgan_5720.0040Standard
9morgan_7580.0039Standard
10morgan_19910.0039Standard

Compare to original: K_fiedler was #2. Against Morgan fingerprints, it does not even appear in the top 20. The topology it captured is already encoded in Morgan bits.


ORIGINAL RESULTS (WEAK BASELINE — PRESERVED FOR TRANSPARENCY)

These results compared K/R/E/T against 16 simplified descriptors from a minimal SMILES parser — no Morgan fingerprints, no ECFP, no MACCS keys, no hyperparameter optimization, subsampled to 4,943 compounds. The delta was real but the baseline was not competitive.

ModelFeaturesAUC (mean ± std)
AStandard only (16 simplified)0.753 ± 0.008
BK/R/E/T only (29)0.796 ± 0.013
CCombined (45)0.808 ± 0.014
Delta C − A: +0.055 — artifact of weak baseline
These numbers are correct. The problem was what we compared against, not how we measured.

K/R/E/T FEATURE DEFINITIONS

K — Coupling (6 features):
  avg_degree — mean atom connectivity
  max_degree — highest-connected atom
  clustering — local triangle density
  density — edge fraction of complete graph
  Fiedler value — algebraic connectivity (λ2 of graph Laplacian)
  Wiener index (normalized) — mean shortest path length

R — Synchronization (6 features):
  EN variance — electronegativity spread
  EN mean diff — average pairwise EN difference
  EN max diff — most polar bond
  Element entropy — Shannon entropy of element distribution
  C/(N+O) ratio — carbon-to-heteroatom balance
  Mass CV — coefficient of variation of atomic masses

E — Energy (4 features):
  Total bond energy — sum of estimated bond strengths
  Bond energy per atom — normalized energy density
  Bond energy variance — how uniform the bond distribution is
  Ring strain — estimated strain from small rings

T — Tension (3 features):
  Rotatable fraction — conformational flexibility
  Conformational entropy — log of rotamer count estimate
  Degree entropy — Shannon entropy of atom degree distribution

K-Lag Autocorrelation (7 features):
  mass_autocorr_lag{1-4} — atomic mass correlation at bond distances 1–4
  en_autocorr_lag{1-3} — electronegativity correlation at bond distances 1–3

HYPERPARAMETERS (GRID SEARCH RESULTS)

Grid searched (same for all models):
  n_estimators: {100, 300, 500}
  max_depth: {3, 5, 7, 9}
  learning_rate: {0.01, 0.05, 0.1}
  36 combinations, 3-fold CV on 8,000 subsample

Best for Model A (Standard): n_estimators=500, max_depth=7, lr=0.1
Best for Model B (K/R/E/T): n_estimators=500, max_depth=3, lr=0.01
Best for Model C (Combined): n_estimators=500, max_depth=9, lr=0.05

Note: Model B selected shallow trees (depth 3) with low learning rate —
the optimizer correctly identified that 26 features need less model complexity
than 2,069 features.

MM12P AUDIT (UPDATED)

Five original claims. The proper baseline test killed the main ones. Updated verdicts:

KILLED K/R/E/T features add +0.055 AUC

Against 16 simplified descriptors, yes. Against 2,069 Morgan+RDKit features with optimized hyperparameters on the full dataset: −0.003 (p=0.50). The +0.055 was an artifact of comparing against a baseline that did not capture molecular topology. Morgan fingerprints already encode the substructure patterns K_fiedler was providing. The critic who flagged this was correct on every point.

KILLED K_fiedler is the breakout coupling feature

K_fiedler looked important only because the simplified baseline missed topology. In the proper model, K/R/E/T features account for 1.9% of total importance. Standard features: 98.1%. K_fiedler does not appear in the top 20. K_wiener_norm (normalized mean path length) appears at #5, the only K/R/E/T feature in the top 20.

WEAKENED Permutation test confirms signal (p=0.00)

The permutation test confirmed that K/R/E/T features carry real signal vs. noise. That is still true — K/R/E/T alone gets 0.751 AUC, far above random. But the relevant question was never "do these features beat noise?" It was "do they beat Morgan fingerprints?" They do not.

KILLED Model B alone beats standard cheminformatics

K/R/E/T only: 0.751. Morgan+RDKit: 0.816. Delta: −0.065, t=−8.48, p=0.001. Not close. The original MM12P audit predicted this kill. We just had not run the test yet.

WEAKENED Same math across domains

Still true — Fiedler works on any graph. But in this domain, the standard tools already capture what Fiedler provides. "Same math" does not mean "additional signal." The framework provides language, not magic.


WHAT SURVIVES

1. K_wiener_norm provides minor complementary signal.
  Appears at #5 in feature importance in the combined model.
  Not enough to move the needle, but not zero either.

2. K/R/E/T features provide interpretable alternatives to black-box fingerprints.
  You can explain what K_fiedler measures. You cannot explain what bit 1648 of a
  Morgan fingerprint means. Interpretability has value, even without AUC gain.

3. Failure archetype characterization (3 types, statistically validated).
  Morgan’s failures cluster in K-space. High-polarity (18/26 significant on ESOL),
  large/sprawling, and ring-strained molecules are where fingerprints go blind.
  K/R/E/T lost the prediction battle but found where the winner is unreliable.

4. Diagnostic use case: flag unreliable Morgan predictions.
  The coupling profile identifies compounds where Morgan should not be trusted.
  Flag them for more expensive methods. Do not replace Morgan. Audit it.

5. The correction process works.
  Original MM12P flagged this exact weakness. External critic confirmed it.
  Re-ran with RDKit, Morgan fingerprints, hyperparameter optimization, full dataset,
  both AUC-ROC and AUC-PR. Page updated same day. This is what honest research
  looks like.

WHAT WAS KILLED

1. "+0.055 AUC over standard" — KILLED by proper baseline.
  Artifact of weak comparison. Against Morgan+RDKit: −0.003 (p=0.50).

2. "K_fiedler is the breakout feature" — KILLED.
  Only breaks out when you leave out the features that already capture topology.
  Does not appear in top 20 against Morgan fingerprints.

3. "K/R/E/T alone beats standard cheminformatics" — KILLED.
  0.751 vs 0.816. Delta: −0.065, p=0.001.

4. "Combined model crosses the 0.80 threshold" — KILLED.
  Morgan baseline already at 0.816. K/R/E/T does not push it higher.

5. "p=0.00" overstates the permutation test.
  Still true. But beside the point now.

6. "Same math" overstates novelty.
  Fiedler (1973). Not ours. And in this domain, already captured by standard tools.

THE FULL CHASE (7 ATTEMPTS)

We didn’t stop at the first kill. We chased every angle.

#AttemptDeltaStatus
1Static K/R/E/T vs weak baseline+0.055KILLED (weak baseline)
2Static K/R/E/T vs Morgan (41K)-0.003Dead
3Dynamic K (Kuramoto, spectral, full 41K)-0.002Dead
4Cross-coupling (molecule × pocket, 2D)-0.002Dead
53D shape + USR reference similarity-0.018Dead
6Regression: lipophilicity (logD)-0.030 R²HURT (p=0.004)
7Regression: solubility (ESOL)+0.006 R²p=0.064 (suggestive)

Seven attempts. Six dead. One suggestive (ESOL, p=0.064 — five K/R/E/T features in the top 20 for solubility prediction, but doesn’t cross the significance threshold).

The boundary found: Morgan fingerprints encode molecular coupling more efficiently than K/R/E/T on graphs. Our graph features are a lower-resolution view of the same information Morgan captures in 1024 bits. Not a different view. A subset. Classification or regression — the wall holds.

The framework’s contribution to drug discovery: interpretability, not performance. K_fiedler tells you WHY a molecule might bind. Morgan tells you THAT it binds. For a medicinal chemist who needs mechanism, coupling features explain what the fingerprint can’t. For pure prediction, Morgan wins.

ESOL is the open thread: solubility IS a coupling measurement (molecule coupling with solvent). Electronegativity differences and bond energy have direct physical meaning there. Five K/R/E/T features ranked in the top 20. p=0.064. Not proven. Not dead. Open.

THE NOVEL FINDING: WHERE MORGAN IS BLIND

K/R/E/T does not improve Morgan’s predictions. But it does characterize WHERE Morgan fails. We ran error analysis across four MoleculeNet datasets (ESOL, FreeSolv, Lipophilicity, HIV) and found that Morgan’s worst predictions cluster in K-space. Three failure archetypes emerged.

ArchetypeK SignatureWhy Morgan Fails
High-polarityElevated R_en_var, R_en_max_diff, en_autocorrInternal electrical tension that substructure fingerprints do not encode
Large / sprawlingHigh Wiener index, high bond count, low density, low Fiedler valueMorgan’s radius-2 window loses spatial context across extended molecules
Ring-strainedHigh clustering coefficient, high ring strain energyMorgan encodes ring topology but not the strain energy stored in it
K features significant among high-error compounds:
  ESOL: 18/26 K features significant (p < 0.05)
  FreeSolv: 17/26
  HIV: 9/26

What this means:
  Morgan fingerprints are not uniformly good. They are good on average and
  blind in specific corners of chemical space. K/R/E/T features can identify
  those corners. Not by replacing Morgan. By diagnosing where it should not
  be trusted.

The practical application: A drug discovery team using Morgan + XGBoost gets predictions. For molecules matching these three archetypes — high-polarity, large/sprawling, or ring-strained — those predictions are less reliable. The team should flag them for additional analysis: docking, MD simulation, or experimental validation. Not every compound. Just the ones the coupling profile identifies as edge cases. This saves compute and improves outcomes. Not by replacing Morgan. By knowing where Morgan is blind.

Honest caveat: This is retrospective characterization, not prospective prediction. The clustering is visible after the fact — we identified failure archetypes by looking at where Morgan already failed. Turning this into a real-time reliability flag requires validation on external datasets not used in the analysis. The pattern is statistically significant. The utility is not yet proven in production.

HONEST LIMITS

The main finding did not survive a proper baseline.
  This is the most important limit. Everything below is secondary.

One dataset:
  HIV only. Given the null result against Morgan, testing on BBBP/BACE/Tox21
  is unlikely to change the conclusion.

Grid search subsample:
  Hyperparameter search used 8,000-compound random subsample for speed.
  Final evaluation used full 41,120-compound dataset.

"Same math" does not equal "same mechanism":
  Graph metrics work everywhere because graphs are everywhere.
  Fiedler on a molecule and Fiedler on a bank are the same computation
  applied to different things. The framework provides language, not magic.

REPRODUCIBLE

# The comparison script is at:
# /tmp/bioactivity/proper_comparison_v2.py

# Requires: rdkit, xgboost, sklearn, scipy, networkx, pandas, numpy
# pip install rdkit xgboost scikit-learn scipy networkx pandas numpy

# Data: MoleculeNet HIV.csv (41,127 compounds)
# Runtime: ~8 minutes on Mac Mini M4

# Results: /tmp/bioactivity/proper_results.json

RELATED

Mutation Scanner — Fiedler damage on protein contact graphs (0.82 AUC single feature)

Financial Crime — Fiedler vector on transaction networks (5/5 fraud detected)

Framework — K/R/E/T definitions across 20 domains

K-Lag Spectrum — K as a function of timescale, not a single number

Jim McCandless, beGump LLC. All computation on Mac Mini M4, 16GB, 35W. No cloud. MoleculeNet HIV dataset (41,120 valid compounds). Original: XGBoost 300 trees, depth 6, lr 0.1, custom SMILES parser. Correction: XGBoost with grid-searched hyperparameters, RDKit Morgan fingerprints (ECFP4 2048-bit) + 21 RDKit descriptors. Page corrected May 10, 2026. Thank you to the anonymous critic who did our MM12P for free.

GUMPResearch · Support · [email protected]