A drug works because it couples with its target. Everybody measures the drug. Nobody measures the coupling. We added coupling features to the standard molecular descriptors. XGBoost got better. The top coupling feature — how easily the molecule fragments — is something no standard descriptor captures. Same graph math that finds fraud in financial networks finds binding in drug targets.
K here is molecular coupling. How tightly bonded the atom graph is. High K = robust connectivity = harder to fragment = different binding behavior. The Fiedler value measures this directly: the algebraic connectivity of the molecular graph.
Given a molecule, predict whether it is active against HIV. Standard approach: compute molecular descriptors (weight, polarity, hydrogen bonds), feed them to a classifier. We asked: what happens if you also describe the molecule as a coupled system — measuring its connectivity structure, energy distribution, and internal tension?
MoleculeNet HIV: 41,127 compounds from real screening data. 1,443 active (3.5%). We subsampled to 4,943 molecules (all 1,443 active + 3,500 inactive) for class balance. AUC-ROC is rank-invariant so relative comparisons hold, but absolute AUC numbers would differ on the full dataset.
| Model | Features | AUC (mean ± std) |
|---|---|---|
| A | Standard only (16) | 0.753 ± 0.008 |
| B | K/R/E/T only (29) | 0.796 ± 0.013 |
| C | Combined (45) | 0.808 ± 0.014 |
K/R/E/T features captured 61.5% of total XGBoost feature importance despite being 64% of feature count (29/45). The top features from Model C:
| # | Feature | Importance | Source |
|---|---|---|---|
| 1 | n_aromatic | 0.0482 | Standard |
| 2 | K_fiedler | 0.0476 | K/R/E/T |
| 3 | frac_s | 0.0434 | Standard |
| 4 | K_wiener_norm | 0.0362 | K/R/E/T |
| 5 | K_max_degree | 0.0332 | K/R/E/T |
| 6 | frac_halogen | 0.0267 | Standard |
| 7 | R_mass_cv | 0.0265 | K/R/E/T |
| 8 | mol_weight | 0.0257 | Standard |
| 9 | tpsa_est | 0.0253 | Standard |
| 10 | hbd | 0.0239 | Standard |
The K (coupling) features dominate the K/R/E/T set. Fiedler value — how easily the molecular graph fragments — is the second most predictive feature overall, behind only aromaticity count.
| Fold | Model A | Model B | Model C | Delta (C−A) |
|---|---|---|---|---|
| 1 | 0.750 | 0.796 | 0.805 | +0.055 |
| 2 | 0.745 | 0.793 | 0.806 | +0.061 |
| 3 | 0.748 | 0.776 | 0.784 | +0.036 |
| 4 | 0.768 | 0.794 | 0.816 | +0.048 |
| 5 | 0.753 | 0.818 | 0.827 | +0.074 |
| Mean | 0.753 | 0.796 | 0.808 | +0.055 |
Five claims. Tried to kill all five. Two survived clean, two weakened, one killed.
Paired t-test: t = 8.60, dof = 4. 95% CI: [+0.037, +0.073]. The lower bound is nearly 6 standard errors above zero. All 5 folds positive. The smallest fold delta (+0.036) is still meaningful. With a baseline of 0.753, an improvement to 0.808 crosses the 0.80 threshold that separates "okay" from "useful" in screening. The signal is real.
K_fiedler (0.0476) is the #1 K/R/E/T feature and #2 overall, nearly tied with n_aromatic (0.0482). It has low correlation with standard features — it is capturing something they miss. But the Fiedler value is from Fiedler (1973). It is a well-known graph metric used in VLSI placement, community detection, and spectral clustering. We did not invent it. We applied it. The claim should be: "a 50-year-old graph metric, when applied to molecular graphs, captures genuine structural information that standard drug descriptors miss." That is less exciting but more honest.
20 permutations is not enough. p=0.00 from 20 trials only means p < 0.05, not p < 0.001. For a proper test, you need 1,000+ permutations. However: the gap between Model C (0.808) and the permutation mean (0.707) is 0.10 AUC — enormous. The signal would almost certainly survive 1,000 permutations. We just cannot claim that precision from the test as run. Corrected claim: "permutation test confirms signal exists (p < 0.05). The 0.10 AUC gap suggests p is much lower, but we ran too few permutations to say how much lower."
Model B (0.796) beats Model A (0.753). But Model A uses 16 simplified descriptors with a minimal SMILES parser — no stereochemistry, no charges, no implicit hydrogens. Real cheminformatics baselines use Morgan fingerprints (2,048+ binary features), ECFP, and MACCS keys. Published benchmarks on MoleculeNet HIV using Morgan fingerprints alone achieve 0.80–0.85 AUC. Against a real baseline, Model B likely loses. The claim as stated is dead. What survives: "K/R/E/T features beat simplified descriptors and add signal on top of them." That is weaker but honest.
The Fiedler value works on molecular graphs, transaction networks, and protein contact maps because it works on any graph. That is Fiedler's result (1973), not ours. "Same math" is true but trivially so — linear algebra works on matrices, and graphs produce matrices. What IS ours: the K/R/E/T interpretation layer, the autocorrelation features (K-lag), and the systematic application across domains. The claim should be: "graph topology features are useful across domains, and the K/R/E/T framework provides a consistent language for them." Not: "we discovered universal math."
Mutation Scanner — Fiedler damage on protein contact graphs (0.82 AUC single feature)
Financial Crime — Fiedler vector on transaction networks (5/5 fraud detected)
Framework — K/R/E/T definitions across 20 domains
K-Lag Spectrum — K as a function of timescale, not a single number
Jim McCandless, beGump LLC. All computation on Mac Mini M4, 16GB, 35W. No cloud. MoleculeNet HIV dataset. XGBoost 300 trees, depth 6, lr 0.1. Custom SMILES parser (no RDKit). Feature computation: ~490 molecules/sec.