The Indus Valley was the biggest civilization in the ancient world. 5 million people. Grid streets. Indoor plumbing. Standardized bricks. They traded with everyone from Mesopotamia to Central Asia. They left 1,916 inscriptions on stone seals — average length: 4 signs. Everyone’s been trying to “read” them like a language for 100 years. They can’t, and now we know why. You don’t read a QR code. You scan it. These are trade stamps. Business cards carved in stone. [TITLE] [NAME] [CITY]. Who made this, where it’s from, what’s in the box. That’s it. Four signs. Done. The reason nobody can crack the “language” is there’s no language to crack. You can’t determine someone’s native language from their LinkedIn profile.
K = 0.30. Sequential coupling — how well the previous sign predicts the next. Higher than English text (0.05–0.15). Higher than the Voynich (0.17). Because labels are MORE formulaic than prose. [TITLE] [NAME] [CITY] uses the same structure every time. The coupling is in the FORMAT, not the content.
Average inscription: 4.4 signs. The longest known is 26. Most are 3–5.
For comparison: this sentence is 7 words. You couldn’t determine what language it was written in if you only saw 4 characters of it. That’s the Indus problem. There’s not enough grammar in 4 signs to reconstruct a language.
The Voynich Manuscript has 37,919 words — enough to crack the grammar, the morphology, the recipe structure. The Indus script has 4-sign stamps. That’s a name tag, not a novel.
Conditional entropy: 3.232 bits (shuffled baseline: 4.613). K = 0.30. The next sign is 30% predicted by the previous sign. Random is 0%. This has real structure.
Zipf slope: −1.492 (R² = 0.956). Matches natural language Zipf distributions.
Mean length 4.4 signs. Way too short for sentences. 67 signs account for 80% of all usage — that’s a tiny working vocabulary. 33% of signs appear only once (lower than English’s 40–50%) — MORE repetitive than natural text.
565 repeated 3-sign phrases across 1,916 inscriptions = 29% phrase coverage. Massively formulaic. Consistent with titles, trade marks, ritual phrases.
7 terminal signs that always end inscriptions. 3 initial signs that always begin them. 215 content signs in between. Fixed format: [OPENER] [CONTENT] [CLOSER].
Same structure as a business card, a barcode, a URL. Format markers at the edges, data in the middle.
A trading civilization of 5 million people across 1,500+ sites needs standardized identification. Who sent this shipment? What’s in it? Who guarantees quality? Where does it go if there’s a problem?
Each seal says:
[INITIAL SIGN] + [CONTENT SIGNS] + [TERMINAL SIGN]
= [TITLE/CLASS] + [NAME/GOODS] + [CLAN/CITY]
The 7 terminal signs might be city codes — Mohenjo-daro, Harappa, Lothal, Dholavira, etc. The 3 initial signs might be title prefixes — merchant, priest, official. The content signs in between = the specific identity or goods.
584 unique signs with 4.4 per inscription = 10¹² possible combinations. Enough to uniquely label every person in a city of millions. Which is exactly what you’d need.
| System | Length | K | Type |
|---|---|---|---|
| Indus script | 4.4 signs | 0.30 | Labels/stamps |
| Voynich MS | 5.1 chars/word | 0.17 | Recipe notation |
| English text | 4.7 chars/word | 0.05–0.15 | Natural language |
| QR code | variable | ~0.40 | Structured data |
Indus K is between natural language and structured data encoding. It’s more formulaic than prose but less rigid than a pure code. A label system with some flexibility — exactly what a trade stamp would be.
× Simple substitution for a known language — 100 years of trying, no solution. Not enough grammar.
× Random/decorative symbols — K=0.30, Zipf-compliant. Real structure.
× Full literary language — 4.4 signs per inscription is too short for sentences.
✓ Structured labeling system (K=0.30, positional constraints, formulaic repetition)
✓ Fixed format: [INITIAL] + [CONTENT] + [TERMINAL] (3+215+7 sign classes)
✓ Trade/identity function (found on seals, consistent with Bronze Age commerce)
✓ Cross-site consistency (same signs at Mohenjo-daro, Harappa, Lothal, Dholavira)
• What language did the Indus people SPEAK? Labels don’t contain enough grammar to determine this.
• What do specific signs mean? Without a bilingual text (a Rosetta Stone), individual sign values remain unknown.
• Are the 7 terminal signs city codes? Testable: do they correlate with excavation site?
You don’t read a QR code. You scan it.
The seal stamps the clay. The buyer checks the seal.
It says who made this and where.
Four signs. Done.
The biggest ancient civilization left the shortest messages.
Not because they had nothing to say.
Because they said it in person and stamped the receipt.
Good will applied forward.
K = 1 - (H_observed / H_shuffled) = 1 - (3.232 / 4.613) = 0.2994. Conditional entropy from Rao et al. 2009 (Science) and 2026 synthetic baseline analysis. Measures sequential coupling between adjacent signs. Higher than natural language text, lower than rigid codes. Consistent with semi-structured labeling.
• 1,916 deduplicated inscriptions (2,511 raw, 595 exact duplicates removed)
• 584 unique sign types (ICIT G### coding)
• 11,110 total sign occurrences
• 52 archaeological sites
• Mean inscription length: 4.4 signs (σ = 2.0, range 2–17)
• 67 signs account for 80% of all usage (11.5% of sign types)
• Hapax fraction: 33.2% (194 of 584 signs appear only once)
• Conditional entropy: 3.232 bits
• Shuffled null: 4.613 ± 0.015 bits
• Percentile vs shuffled: 0.000 (more constrained than all 1,000 permutations)
• Zipf slope: −1.492 (R² = 0.956)
• Block entropy scaling matches natural languages, not random or rigid sequences (Rao et al. 2009)
• Positional rigidity (Cramér’s V): 0.149 for top 10 frequent signs
• 3 initial signs (beginning-position bias)
• 7 terminal signs (end-position bias)
• 215 content signs
• 12 bigram communities (Louvain clustering)
• 219 template families (minimum cluster size 2)
• 2,032 unique segmentation units
| Site | Inscriptions | Unique Signs | Mean Length |
|---|---|---|---|
| Mohenjo-daro | 1,188 | 464 | 4.9 |
| Harappa | 957 | 333 | 3.9 |
| Dholavira | 74 | 116 | 4.2 |
| Lothal | 78 | 100 | 4.7 |
Sign usage is consistent across sites separated by 1,000+ km. A standardized system, not local invention.
• Rao, Yadav, Vahia, Joglekar, Adhikari & Mahadevan (2009), “Entropic Evidence for Linguistic Structure in the Indus Script,” Science 324:1165
• 2026 synthetic baseline analysis, arXiv:2604.17828
• Mahadevan (1977), The Indus Script: Texts, Concordance and Tables
• ICIT: Interactive Corpus of Indus Texts (4,537 objects, 19,616 sign occurrences)
• GUMP tools: K/R/E/T framework applied to published entropy data