Information Storage and Recovery Research Paper

Humans survived through millennia by passing on knowledge through stories, ideographs, and pictures. Sumerians (circa 3000 BCE) and Chinese (circa 1500 BCE) were the first to develop rudimentary writing methods. Information was soon stored in encyclopedic form, in libraries, and, after the fifteenth century, further disseminated in mass-produced books. Fast-forward to the twenty-first century; computers and the Internet have redefined what it means to “look something up.”

Knowledge is lost if it cannot be stored and retrieved. Throughout most of the five hundred thousand years of human history, the storage methods employed were primitive at best. The only means ancient tribal people had for transcribing information was through the use of ideographs and pictures drawn on cliff faces and cave walls, or by cutting meaningful notches into strips of wood. Most knowledge was passed from one generation to the next through the repetition of stories and the use of rhyming memory clues. Such stories, much like the tales told among primitive people today, transmitted social codes, related tribal history, and taught the young about what the old generation knew of the dangerous and benign animals and plants in the natural world. The number of such stories told over the entire span of human development must be enormous. The U.S. anthropologist Melville Herskovits estimates that today’s African tribes have between 200,000 and 250,000 stories. The number of recorded tales among the North American Indians also runs into the thousands, and then there are those of today’s tribes in Polynesia, Australia, and other parts of the world. Most of these stories are rhythmic and repetitive, filled with emotion and visual details that act as an aid to the exact transmission of the content.

Ancient Record Keeping

We are so accustomed today to seeking information in libraries, dictionaries, encyclopedias, and on the Internet that it is somewhat difficult to visualize societies in which children grew up learning only the things that adults passed on to them by word of mouth. When people finally did develop rudimentary methods of writing, they first used this skill only for recording rather mundane matters such as imprinting to denote ownership, listing barter transactions, and producing magical incantations. It wasn’t until the Egyptians developed papyrus about 4000 BCE that information could be written and preserved on scrolls and later in books to be easily moved from place to place and stored in libraries. Around 3000 BCE the Sumerians, who lived in Mesopotamia, first reached the level of primitive writing using a rudimentary alphabet. The Chinese independently developed their form of writing about 1500 BCE employing separate symbols for each word. Over the centuries the Chinese kept adding symbols until their language today uses about 50,000 separate characters, all based on those first introduced during the Shang dynasty (1766–1045 BCE).

Handwritten papyrus scrolls and later printed books were often collected together in private and public libraries where authorized persons could view them. The famed library at Alexandria is estimated to have had several hundred thousand papyrus scrolls in its storage bins, most containing a wealth of information acquired by the Egyptians and Greeks. All of these were lost when the library was destroyed during a time of military siege. It is said that the victors burned the scrolls to heat the public bath water. This is not the only recorded disaster to thwart the accumulation of knowledge. In the mid- 1500s, Bishop Diego de Landa ordered the Spanish conquistadors to burn all the Mayan bark-cloth books they could find, because these “contained nothing but superstitions and falsehoods of the Devil.” The great collection of Mayan astronomical knowledge was thus destroyed, as were all the Mayan officials and intellectuals who knew how to read the texts. Archaeologists are still trying to decipher the few manuscripts that survived.

Encyclopedias: Circles of Knowledge

To the ancient Greeks, knowledge was static and not dynamic. They held that, once understood, the world could be controlled; that it never changed even as new things were discovered about it. The Greeks set about gathering all the known facts and recording them on scrolls that later became known as encyclopedias (enkyklios, meaning circular or complete, and paideia, meaning education). Pliny the Elder (23–79 CE) used the two words in this way in his preface to Natural History, where he says that he treats all the subjects of the circle of learning. Pliny’s work consisted of 37 parchment scrolls and 2,493 articles. History, however, awards the title “compiler of the first encyclopedia” to Speusippos, the nephew of Plato who took over the leadership of the Athens Academy after Plato died in 347 BCE. Only fragments of the work by Speusippos survive, but we do know that he intended his encyclopedia to be a teaching aid for use in lectures at the Academy.

The first Roman to produce an encyclopedia was Marcus Porcius Cato (234–149 BCE), a general and quite prominent citizen of Rome. His work, Praecepta ad Fillum, covered the subjects of agriculture, law, medicine, oratory, and war. The entire text was eventually lost but other Roman encyclopedias of note for which we have remnants are those produced by Marcus Terentius Varro (116–27 BCE) and Aulus Cornelius Celsus, the latter a twenty-six-volume work written during the reign of Tiberius (14–37 CE).

Like the Greeks the ancient Chinese established academies of scholars to gather all their knowledge so that it could be passed on to later generations. Their first encyclopedia was the Huang Yan (Emperor’s Mirror) prepared under the order of an emperor about 220 CE. No part of this work survives, but later Chinese encyclopedias describe it as a carefully structured work written in fine literary style. The Chinese also produced the world’s most extensive encyclopedia, encompassing 11,095 volumes and over 500,000 pages. This is the Yongle dadian (Great Compendium of Yongle). Some 2,000 scholars, commissioned in 1403 by Zhu Di (known as the Yongle Emperor), finished their work in 1408. A total of 370 volumes of this massive encyclopedia survive, now scattered in libraries all over the world.

Ibn Qutaiba (b. 828 CE), a teacher and philologist, produced the first Arabic encyclopedia, which was titled Kitab Uyun al-Akhbar (The Best Traditions). The work was divided into ten books covering asceticism, character, food, friendship, learning and eloquence, nobility, power, prayers, war, and women. Other Arabic encyclopedias followed, including a noted medical encyclopedia by Averroes (1226–1198) called Al Shifa (“the Healing). Throughout much of the medieval period in Europe, Islam flowered in scholarship and preserved many of the Greek and Roman texts that had been forgotten or destroyed in the Christian world. Not until the later Middle Ages did Christian scholars, through Arabian contacts, begin to rediscover and collect this wisdom.

Medieval Europe did, however, produce a number of encyclopedias filled with the somewhat limited interests of the Christian communities. One of these was the twenty-volume work Originum seu Etymologiarum Libri XX (Study of the Origin of Words, Book 20) by St. Isadore of Seville (c. 560–636). It is unique for its time because Isadore’s starting point was the liberal arts rather than theology. Most of the medieval encyclopedias were compiled, without acknowledgement, from bits and pieces of other people’s work. Plagiarism was apparently a natural outgrowth of the medieval copying process. One such work deserves mention—because of its fine craftsmanship and because it was produced by a woman, Abbess Harrad of the Hohenburg convent near Strasbourg, in Alsace. Her encyclopedia, called the Hortus Deliciarum (Garden of Delights), was completed shortly before 1195 and is one of the finest examples of illuminated manuscripts ever produced.

With the return of wide-ranging scholarship during the Renaissance, along with the development of the printing press to produce multiple copies of books, encyclopedias by many authors became commonplace. Some of the most notable among these were one by the English clergyman John Harris, who was the first to arrange entries alphabetically (1704); one by the German Johann Zedler that was the first to include biographies of living persons (1732); and the Encyclopedie by the French scholars Denis Diderot and Jean D’Alembert (first part published in 1751), which was not only superb in terms of scholarship but was banned by the French government prior to the French Revolution because of its “subversive content.” A few words should also be said about the Encyclopaedia Britannica that first came out in the form of a series of pamphlets (a hundred installments) between 1768 and 1771. These pamphlets could be stitched together later and bound into three leather-covered books. The Britannica, which has since been published in many editions, established a form that has been followed by many encyclopedias—extensive articles, some more than a hundred pages in length, on broad general topics. Modern encyclopedias are found in almost every language and are published in every advanced nation.

Accessing Information

The problem of finding an exact item of desired information among the mass of stored data is certainly not new, although today’s magnitude of data may be. More than 5,000 years ago the Assyrians tried to deal with this problem by inscribing notations on their clay information cylinders in characters so small they are almost unreadable. The Sumerians devised a kind of index to mark their clay tablets. But the concept of organizing information into useable categories and sub-categories only became popular during the sixteenth century. One of the prime movers in this was the English philosopher and statesman Francis Bacon, who attempted to arrange his comprehensive view of knowledge into logical sections and sub-sections that would be useful to any literate person. Bacon’s method, published in a section of his De Augmentis Scientarum (1623), was consulted and somewhat utilized by a number of persons who in later years compiled encyclopedias, and it was highly praised by Diderot in his Encyclopedie. The poet and essayist Samuel Taylor Coleridge (1772–1834) also tried to encompass all human knowledge in a carefully structured design intended for an encyclopedia that was never completed. Coleridge deplored the alphabetical arrangement in some of the more popular encyclopedias of his day because such an arrangement was only “determined by the accident of initial letters.”

Disorganized knowledge is of almost as little value as no knowledge at all, and therefore every library has needed some system by which to arrange the variety of books. President Thomas Jefferson, for his own library, adopted the system advocated by Francis Bacon. A method still in use in many libraries around the world is that developed by the American librarian Melvil Dewey (1851–1931). This Dewey decimal system divides books into ten main categories, each represented by a three-digit number ranging from 000 to 999, with each of the ten groups further divided into more specialized fields by subsequent numbers following a period.

All the storage and recovery systems of the past have now been dwarfed by digital computers and hypertext. Developed in 1944, the first digital computer, called Mark I, was a massive machine controlled by a series of mechanical and electrical devices. It bore little resemblance to the desktop and laptop machines of today, and its primary use was for rapid mathematical computations. Within a few years the storage capacity of computers grew at an astounding rate so that today virtually anything that can be digitized (represented by a discrete set of its points) can be stored on a computer—words, images, sounds, and more. The massive memory modules on large mainframe computers now have the ability to store all the recorded knowledge of all the libraries in the world in a single location. By March 2007, according to The New York Times, the popular computer web browser, Google, had digitized one million out-of-print books, and by 2009 it had digitized about six million. Anyone with a computer can access any or all of these books from their own living rooms.

Such an overload of information would be impossible to manage were it not for use of hypertexting and the rapid indexing ability of computers. Hypertexting is the method by which certain words in text displayed on a computer are marked (hyperlinks) so that a reader can, with a keypress sequence or mouse click, immediately access similar occurrences of that text within other parts of the document and even in other documents. The computer’s indexing ability allows a user to immediately locate specific words or phrases in any displayed text by merely typing what is wanted in a “find box.”

Thus the ability of human beings to store and recover the knowledge they acquired has grown from simple memorization methods through written documents and books to complex computer storage systems that no one could have imagined fifty years ago.


