The Voynich Manuscript
The Voynich Manuscript (Yale Beinecke MS 408) is arguably the most mysterious book ever made. A 240-page illustrated codex written entirely in an unknown script, filled with drawings of unidentifiable plants, naked bathing women, astronomical diagrams, and elaborate text — it has resisted decipherment by professional cryptanalysts, linguists, and AI systems for over a century of sustained effort. No one knows what it says, who wrote it, why it was written, or whether it even encodes meaningful information at all.
Radiocarbon dating of the vellum (parchment) places its creation between 1404 and 1438 — early 15th century Europe, probably northern Italy. The ink, paint, and quire structure are consistent with this dating. It is not a post-medieval forgery.
Confidence level on dating: established
Confidence level on content/authorship: entirely open
Freshness date: as of April 2026, the manuscript remains undeciphered
Key Facts
- Location: Yale University’s Beinecke Rare Book and Manuscript Library (MS 408), donated by H.P. Kraus in 1969
- Dating: Vellum radiocarbon-dated to 1404–1438 (95% confidence interval; University of Arizona, 2009)
- Size: ~240 pages of vellum, ~170,000 characters, ~35,000+ words
- Script: “Voynichese” — a unique writing system with ~25–30 distinct glyphs, written left to right
- Scribes: At least five distinct scribes identified by paleographer Lisa Fagin Davis, writing two distinct text variants (Currier A and Currier B)
- Language of origin: Unknown — possibly encrypted Latin/Italian, an unknown natural language, a constructed language, or nonlinguistic content
- Cost in 1590s: Holy Roman Emperor Rudolf II reportedly paid 600 gold ducats for it
- Scholarly status: No decipherment has been accepted by mainstream cryptology or linguistics
The Illustrations — What the Book Shows
The manuscript is conventionally divided into six sections based on illustration type:
1. Herbal Section (~half the manuscript)
Hundreds of botanical illustrations of plants, herbs, and root systems. Crucially, none are unambiguously identifiable as real plant species — they appear to combine features of different known plants, or depict fantastical species entirely. This is one of the deepest mysteries: a 15th-century herbal manual would normally be cross-referenced with real, known plants.
2. Astronomical / Astrological Section
Circular diagrams with concentric rings, zodiac symbols, and radial structures. The zodiac sequence begins with Pisces rather than Aries — the “wrong” starting point for a standard astronomical calendar, but plausible for a medical or agricultural calendar keyed to a specific season. Thirty small feminine figures hold stars around the rings.
3. Biological / Balneological Section
The most visually striking section: dozens of small naked female figures (mostly unclothed, mostly round-bodied) bathing in pools, connected by an elaborate system of tubes, channels, and pipes. Some figures appear connected to each other through the tubing. Pools resemble baths or womb-like vessels. A 2024 study in Social History of Medicine argues this section depicts gynecology, conception, and female reproductive physiology (see concept-voynich-theories).
4. Cosmological Section
Full-page elaborate diagrams with radiating structures, sometimes called “rosettes.” The largest illustration (the “Rosette page”) is a nine-panel fold-out diagram. Researchers Keagan Brewer and Michelle L. Lewis (2024) proposed it represents coitus and conception, connecting it to the work of Bavarian physician Johannes Hartlieb (~1410–68), who wrote on plants, women’s medicine, astronomy, baths, and explicitly recommended using “secret letters” to hide recipes related to contraception and fertility.
5. Pharmaceutical Section
Pages showing containers, jars, vials, and plant parts — suggestive of a pharmacopoeia or recipe collection.
6. Recipe Section
Dense paragraphs of unbroken text with decorative paragraph markers, structurally similar to a list of recipes or instructions.
Carbon Dating and Provenance
Radiocarbon Dating (Established)
In 2009, four samples from different pages were radiocarbon-dated at the University of Arizona. Results: 1404–1438 AD (95% confidence). The consistency across pages indicates a single production event. This definitively rules out a post-Renaissance forgery. However, the parchment could theoretically have been produced earlier and written on later — though the ink chemistry and binding style are consistent with the early 15th century.
Known Ownership Chain
The manuscript has a surprisingly well-documented provenance for its age:
- Carl Widemann (physician/alchemist, Augsburg) — earliest probable owner, ~1590s
- Holy Roman Emperor Rudolf II — purchased for 600 gold ducats around 1599; believed it was the work of Roger Bacon
- Jacobus Horcicky de Tepenecz (head of Rudolf’s botanical gardens) — his near-invisible signature appears on folio 1r
- Georg Baresch (Prague alchemist) — held it mid-17th century, puzzled by it; corresponded with Jesuit scholar Athanasius Kircher
- Johannes Marcus Marci — sent it to Kircher in Rome in 1666 with a cover letter suggesting Roger Bacon as author (this letter survives and is the primary documentary provenance)
- Jesuit College at Frascati near Rome — held it until 1912
- Wilfrid Voynich — antique book dealer who purchased it from Jesuits in 1912, giving it his name; tried (and failed) to decode it
- H.P. Kraus — acquired from Voynich’s estate, donated to Yale in 1969
Roger Bacon (13th century) as author was disproved by the radiocarbon dating — the vellum postdates Bacon by ~150 years.
Statistical and Linguistic Properties
This is where the mystery deepens. Voynichese has been subjected to rigorous computational analysis, and the results are deeply strange:
Evidence it encodes meaningful content:
- Word frequencies follow a Zipfian distribution (the hallmark of natural languages)
- Second-order entropy, co-occurrence patterns, and keyword clustering are compatible with natural language structure
- Vocabulary shows complex organizational patterns across sections — different sections use different words, as if covering different topics
- It is not random noise
Evidence something is wrong:
- Second-order conditional entropy (~2.0) is far lower than any known natural language (typically 3–4), suggesting more predictable character sequences than any known language
- Character placement is highly constrained within words — some glyphs only appear at word beginnings, others only at ends, in patterns more rigid than any known script
- Word repetition rates are anomalously high
- The text lacks the rare-word “tail” typical of natural language distributions
The two variants (Currier A and B): Military cryptanalyst Prescott Currier identified in the 1970s that the manuscript contains two statistically distinct text types, now called Currier A (found in herbal and pharmaceutical sections) and Currier B (found in balneological, astrological, and other sections). Lisa Fagin Davis subsequently identified five distinct scribes — Scribes 1 and 4 write A, Scribes 2, 3, and 5 write B. This suggests either two different underlying languages, two encryption methods, or two compositional phases.
Leading Theories (Current as of 2026)
1. Encrypted Natural Language (Cipher Hypothesis) — Most Mainstream
The text encodes a real natural language via a cipher system. Candidate languages have included Latin, Italian, Hebrew, Old Turkish, proto-Romance, Arabic, and others. No proposed decipherment has been accepted — all fail to produce coherent, verifiable output for more than a few words.
The Naibbe cipher (2025, published in Cryptologia) is the most technically sophisticated recent support for this theory: it demonstrates that a historically plausible verbose homophonic substitution cipher using playing cards and dice can encrypt Latin or Italian into text that reliably reproduces the statistical properties of Voynichese — word length distributions, entropy levels, character positional constraints, symbol frequencies. Named after a 14th-century Italian card game, it uses dice rolls to break plaintext into chunks and card draws to select from six encryption tables. Crucially, it can also be reversed — it is decipherable, not just encodeable. However, it does not decode the actual manuscript; it proves the cipher hypothesis remains viable by existence proof.
2. Constructed / Invented Language (Glossolalia / Conlang Hypothesis)
The text might be a deliberate invented language or “philosophical language” — an attempt to create a universal or idealized tongue. Such projects existed in the 15th–17th centuries. The lack of identifiable real-world words could be explained if the entire vocabulary was invented from scratch.
3. Deliberate Hoax (Gibberish Hypothesis)
The manuscript might be meaningless — deliberately constructed to look linguistic without encoding anything real. The most famous version involves a skilled Renaissance forger producing it to sell to a credulous buyer (Rudolf II being a prime target, given his known appetite for curiosities).
A 2025 experiment added troubling data: volunteers asked to write pages of convincing-looking gibberish produced texts with Zipfian-like distributions and similar statistical properties to Voynichese. This suggests that some properties previously considered “proof” of linguistic structure can emerge from intelligent gibberish production.
However, the hoax theory has difficulties: the production cost (240 high-quality vellum pages, extensive color illustration, five scribes) is enormous for a fraud. And the statistical properties are difficult — though not impossible — to fake consistently across 240 pages.
4. Unknown Natural Language (Unenciphered Rare Language)
Some researchers propose it’s written directly in an unknown or rare natural language — perhaps a regional dialect, creole, or extinct language — using a custom script. This would explain the linguistic-structure signals while accounting for non-decipherment. Northern Italian regional dialect, proto-Romani, or minority European languages have been proposed.
5. Steganographic Content
The text may be a carrier for information hidden in the illustrations, the arrangement of words, or subtle ink marks — rather than encoding meaning in the text itself. The illustrations then may not correspond to the text at all.
Recent Computational and AI Approaches
Multispectral Imaging (2024) — Key New Finding
The Lazarus Project had imaged 10 pages of the manuscript in 2014 but never published the results. In September 2024, Roger Easton (Rochester Institute of Technology) reprocessed the images and shared them with Lisa Fagin Davis, who discovered three previously invisible columns of letters on folio 1r. Two columns use the Roman alphabet; one uses Voynich script. The Roman alphabets are offset by one position — a classic single-shift substitution. Davis concluded this represents an early owner’s attempt to decode the text using two different substitution ciphers, or possibly their attempt to develop their own cipher using Voynich characters. The early decoder failed — but the existence of this attempt confirms the manuscript was taken seriously as an encoded text even in the 17th century.
AI and Machine Learning
Multiple computational approaches have been applied, with inconclusive results:
- Neural language models (University of Alberta, 2019): suggested ~80% of Voynich words appeared in Hebrew dictionaries — cited as evidence for encoded Hebrew. This claim was heavily criticized; the Hebrew match rate was achieved after removing word-level vowels (gematria-style), making it too permissive a match criterion.
- Topic modeling (LDA, LSA, NMF): computational models identified vocabulary clustering across sections consistent with topic differentiation — the herbal section uses different vocabulary from the balneological section, suggesting genuine topical organization.
- Entropy analysis: consistently shows Voynichese has anomalously low entropy compared to any natural language — a result that cuts against simple substitution cipher theories.
- Large language models (2024–2026): GPT-class models have been applied but have not produced accepted decipherments. The fundamental problem remains: without a bilingual key (like the Rosetta Stone), statistical pattern-matching cannot distinguish the “correct” interpretation from millions of plausible wrong ones.
The core challenge for AI: any sufficiently flexible model can find patterns in any text. Without external validation (a known translation of even a few words), there is no way to confirm when a pattern found is real versus spurious.
Connections to Cryptography and the History of Science
The Voynich Manuscript sits at a remarkable crossroads:
Cryptography history: It was studied by William Friedman (who broke Japanese PURPLE cipher in WWII) and Elizabeth Friedman, both of whom failed to crack it. Prescott Currier, a Navy cryptanalyst, applied WWII-era techniques. The NSA reportedly had an internal Voynich study group. The manuscript is a canonical test case in cryptanalysis — if the best professional codebreakers in history couldn’t crack it, either it uses an extremely clever cipher or it’s not a cipher at all.
Renaissance natural philosophy: The illustrations fit within a tradition of illustrated herbals, alchemical manuscripts, and astrological compendia from 15th-century northern Italy. The castle sketch near a bathing figure shows swallowtail merlons (distinctive Ghibelline architecture found near Verona/Milan) — suggesting northern Italian geographic origin for the illustrator.
History of women’s medicine: The 2024 Brewer/Lewis study connects the manuscript to a documented tradition of enciphering gynecological and reproductive medicine — Johannes Hartlieb’s known recommendation to use “secret letters” for sensitive medical topics is a striking parallel. The bathing figures may not be erotic — they may be anatomical or obstetric.
Philosophy of language: The manuscript is a touchstone in debates about the distinguishability of language from non-language, the limits of computational linguistics, and whether pattern and structure imply meaning. It is cited in discussions of what counts as a “language.”
Cross-Realm Connections
- event-gobekli-tepe: Both are artifacts where human intentionality is undeniable but content is opaque — we can see the sophistication without reading the message. The Voynich manuscript is Göbekli Tepe of the library.
- event-bronze-age-collapse: The Indus Valley script (also on the curiosity seeds list) remains undeciphered — both are testaments to how fragile cultural transmission is, and how civilizations can produce sophisticated written records that become permanently opaque.
- concept-turbulence: Both problems share a structure: technically tractable-looking systems that resist all tools applied. Turbulence resists mathematical solution; Voynichese resists linguistic decipherment. In both cases the problem may be category-theoretic — we may be asking the wrong question.
- concept-distributed-cognition: The five-scribe analysis points to collaborative production of the manuscript — a distributed intellectual project, not a lone eccentric’s work.
- tech-jacquard-loom / concept-fabric-as-data: The manuscript’s script has been compared to textile pattern notation; some researchers have proposed that Voynich characters derive from shorthand systems, alchemical symbols, or even weaving notation conventions common in northern Italy.
See Also
- concept-voynich-theories — detailed breakdown of each decipherment theory with evidence and objections
- event-gobekli-tepe
- event-bronze-age-collapse
- tech-jacquard-loom
- concept-fabric-as-data
- concept-turbulence