The most enigmatic book known—the Voynich manuscript, a text supposedly from the early 15th century and written in an unknown alphabet—appears to not be just a random jumble of meaningless symbols, as some scholars claim. At least this is the conclusion reached by a group of Brazilian physicists after using advanced statistical techniques to analyze the document that has long frustrated the greatest experts in deciphering encrypted codes.
Little is known about the manuscript and its history, only that it is written in an invented alphabet never seen in another document. It was acquired in 1912, near Rome, Italy, by a Polish bookseller named Wilfrid Voynich, who married the daughter of George Boole, a famous British mathematician. The manuscript is richly illustrated with images of plants and celestial bodies, suggesting that it is a text on herbs and astrology. But its contents remain an enigma.
The physicists have not deciphered it either, but based on their analysis, published in July in the journal PLoS One, they believe they have identified its keywords, i.e. the set of words that most closely identify the topics covered in the text. A future translation of these words could finally reveal something about the book’s message, if indeed there is a message. The Brazilian team, composed of researchers working in Germany and São Carlos, in the state of São Paulo, also concluded that the text of the manuscript has all the statistical properties expected of a text with meaning. If they are right, then the manuscript is not a sequence of meaningless symbols.
In any event, the method developed by the researchers to study the Voynich manuscript has other applications. “It allows us to identify the keywords in a long text without having to know their organization, or compare it with other texts, as search engines like Google do,” says one of the study’s authors, the physicist Eduardo Altmann, of the Max Planck Institute for Physics of Complex Systems in Dresden, Germany. Together with Dresden library staff, Altmann has been working on the implementation of a system for automatic classification of documents that would find potentially important words that were overlooked during the classification of the books by librarians. “This system could help find connections between different scientific disciplines,” he says.
Since 2009, Altmann has been specializing in using techniques from statistical physics to analyze the frequency with which words appear throughout a text (see Pesquisa FAPESP No. 185). At a conference some time ago, he met another Brazilian based in Germany, the physicist Diego Rybski, of the Potsdam Institute for Climate Impact Research, who had heard of the Voynich manuscript.
Reproduction Beinecke library of rare books and manuscripts / Yale UniversityIs it a fraud or not?
Currently archived at the Beinecke Library at Yale University, Voynich found the manuscript amongst the books in a collection owned by Italian Jesuit priests and bought it. Some speculated that the manuscript was a fraud created by Voynich, who profited from its sale, but historians and biographers have dismissed this hypothesis.
The manuscript was accompanied by a letter dated 1666, signed by an academic in Prague, today in the Czech Republic, asking a Jesuit in Rome to try to decipher it. The correspondence suggests that the manuscript belonged to Rudolf II (1552-1612), emperor of the Holy Roman Empire, known for his fascination with the occult, and that the author of the book was perhaps the English philosopher and Franciscan friar Roger Bacon, who lived from 1214 to 1294. However, a physical-chemical analysis of the paper and inks, performed in 2010, concluded that the manuscript must have been produced between 1404 and 1438.
With the dimensions of a paperback book, and bound in vellum, the 240 manuscript pages are richly illustrated, and some pages unfold to several times the size of the book. The subjects of the drawings are the only clue as to what the text contains. Half of the volume portrays whole plants, mostly unidentifiable (except for three, but those species occur in various parts of the world, which does not help to determine their origin). This is followed by an astrological section, with drawings of the sun, the moon, the stars, the zodiac, circles in the sky and many nude women. The following section contains some strange tubes, which might represent blood vessels, microscopes or telescopes, and more nude women in pools. Then comes the pharmaceutical section, containing a list that appears to be of leaves and roots. The book ends with pages filled with a text consisting of a series of short paragraphs, illustrated only by stars on the margins.
The approximately 40 symbols in the text vaguely resemble Arabic numerals and Latin alphabet letters, as well as some symbols used by medieval alchemists. They are organized as in any Western text, grouped into words separated by spaces. Simultaneously familiar and unique, the language of the manuscript challenged all the experts who examined it. The book was an obsession for the American cryptographic analyst William Friedman, famous for cracking German and Japanese secret codes during World War II. After 20 years of trying, Friedman came to the conclusion that its message was written in an invented language.
Reproduction Beinecke library of rare books and manuscripts / Yale UniversityBeginning in the 1990s, a community of about a hundred researchers from various disciplines interested in the Voynich manuscript began communicating over the Internet. To facilitate discussion of parts of the text via email, they associated each written letter in the manuscript with a Latin character. This transcription facilitated the statistical analysis of the text by computer and its comparison with other texts.
One of these analyses, suggesting that the Voynich manuscript was a fraud, garnered attention in 2004. The psychologist and mathematician Gordon Rugg of the University of Keele, England, discovered how to create a sequence of symbols similar to Voynich writing through cryptographic techniques available in the Renaissance era. Rugg believes that the book is the work of a charlatan from the 16th century out to obtain the gold offered by Rudolf II for mystical relics. In 2007, physicist Andreas Schinner of the Johannes Kepler University in Austria, suggested in an analysis published in the journal Cryptologia that the text of the Voynich manuscript could have been created by a random process. These results, however, did not discourage the majority of Voynich scholars, nor shook the faith of mystics who believe that the manuscript contains some divine or alien prophecy.
“The quantity of literature on the Voynich manuscript is scary and made me wonder to what extent its goal is scientific,” says Altmann. “That is why we tried to formulate more general questions in our work, hoping that the study will have other applications.”
Agglomeration and dispersion
Altmann and Rybski collaborated with the physicists Osvaldo Oliveira Jr. and Luciano da Fontoura Costa at the University of São Paulo, São Carlos Institute of Physics, who treated the texts as if they were complex networks of words (see figure on the left side). “Two words are connected in the network if they appear next to each other in the text,” explains Diego Raphael Amancio, Costa’s PhD student and first author of the PLoS One article.
Before addressing the Voynich manuscript, researchers assessed 29 types of statistical measures that can be obtained from the analysis of any text. These quantities measure how the words are clustered or dispersed throughout the text or measure the distribution of the various possible arrangements of connections between words when text is represented as a complex network. “Performing a thought experiment, if we could analyze all texts ever written in all existing languages we would have all possible values for these statistical measures,” explains Altmann. In this ideal situation, the researchers would then have a sort of statistical signature for each possible text.
Altmann sought out long digitized uncopyrighted texts translated into over 10 languages for analysis. One of the texts he found was the New Testament. The team analyzed the biblical text in 15 languages, from Arabic to Xhosa, spoken by a small group in South Africa, and compared it with versions of the text with the words shuffled. Physicists thus got an idea of which statistical measures are more sensitive to variations in the language and which of them can distinguish a text with meaning from a text with a random series of words.
Diego R. Amancio / IFSC / USPCthy, qokeedy and shedy
The researchers also compared various texts written in the same language, analyzing 15 classic novels from Portuguese literature and 15 from English literature, as well as their shuffled versions, determining which measures depend more on the particular message of the text than on the language in which it is written.
Unlike Schinner’s analysis, the measures analyzed by the Brazilians indicate that the Voynich manuscript has syntactic structure and conveys a message. “In my experience, Schinner’s results are not necessarily an indication that the text is not written in a natural language,” says Altmann. He further explains that, if they had compared the manuscript to more books, it would be possible to say which language is the closest to the one used in the manuscript.
It was by combining some of these measures that the researchers developed a method for distinguishing which words convey the meaning of the manuscript rather than just fulfilling a merely syntactic role, like articles and prepositions. Applied to the Portuguese New Testament, for example, the method results in a list of words including “Maria,” “born,” “boy,” “sepulcher” and “blessed.” The Voynich list includes cthy, qokeedy and shedy.
A study published a week earlier, also in PLoS One, reached similar conclusions. The Argentine Physicists Marcelo Montemurro of the University of Manchester, England, and Damián Zanette of the Balseiro Institute, Argentina, used different statistical techniques and arrived at a very similar list of important words.
“Each new study of the manuscript shows details that are characteristic of natural languages and unlikely in random texts,” says Jorge Stolfi of the University of Campinas (Unicamp) Institute of Computing, who analyzed the manuscript from 1997 to 2004. “I would guess it is a phonetic transcription of some East Asian language, made by a European, probably dictated by a native, using an alphabet invented by the author for this purpose.”
“Unfortunately, I do not know how to follow up on this possibility,” says Stolfi, whose theory caused quite a stir when it was introduced last year at a conference celebrating the centennial of the discovery of the manuscript. “Even if my hypothesis is correct, I do not venture to predict when it will be deciphered.”
1. Use of complex networks for natural language processing (No. 2010/00927-9); Grant mechanism Doctoral; Recipient Diego Raphael Amancio; Investment R$109,708.56 (FAPESP).
2. Models and methods of e-Science for life and agricultural sciences (No. 2011/50761-2); Grant mechanism Thematic Project; Coord. Roberto Marcondes Cesar Junior/IME-USP; Investment R$1,033,785.69 (FAPESP/CNPq.)
AMANCIO, D. R. et al. Probing the statistical properties of unknown texts: Application to the Voynich manuscript. PLoS One. v 8(7). Jul. 2013.
MONTEMURRO, M. A. and ZANETTE, D. H. Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis. PLoS One. June 21, 2013.