Table of Contents
Document Status: Written May 21-26, 2019. Presented at ISBL in Rome July 3, 2019. Formatted and published online August 2, 2019.
First of all I’d like to thank the conveners for assembling such a great session. I will continue some of the ideas we have already heard, such as the importance of interoperability and open standards. My own contribution will focus on manuscripts. Before digital technology most of us had access to manuscripts by way of photographic plates and critical editions. The few who had the privilege of handling the manuscript itself faced their own challenges. Digital technology opens new possibilities for extending and improving the role of manuscripts in our research and teaching. The impact of digitizing manuscripts is unmistakable, but digitizing is not, or at least should not, be an end in itself. We should think about why we digitize and what we expect a digital surrogate of a manuscript to be able to do. In turn, this leads to consideration of how to digitize and what to do with the digital surrogate.
For some background to the three priorities I plan to address, let me show you an example of the same page of the Jubilees Palimpsest digitized six ways. My training is in early biblical interpretation and early Jewish literature, especially the Book of Jubilees. I started off with an interest in reading more text of the only Latin copy of Jubilees, which dates from the fifth century and was abused to the point of illegibility since. As I went I encountered questions and potential for digitization of manuscripts to do much more than add to critical editions. So we see here six digital versions of the same page of the Jubilees Palimpsest. The first is a one-bit image of Ceriani’s edition from 1861. It may be the most legible of all, but gives the least sense of the manuscript as anything other than a text container. Even as a transcription, it is incomplete and questionable. It is highly accessible, thanks to Google Books. My first look at the manuscript was on microfilm, shown here as an eight-bit digital image. I had access to it only by special request to my dissertation director, James VanderKam. It captures a moment in the conservation history of the manuscript, but is useless for reading the erased under text. I saw the manuscript in real life for the first time in 2011, again thanks to inside connections. The Biblioteca Ambrosiana was willing to digitize one bifolio for me to use in grant applications. That is the third image on screen. It would look good if not juxtaposed to the fourth image, which was captured in 2017 with high-precision spectral imaging equipment calibrated for color accuracy. Just because something is digitized in color doesn’t mean the color is accurate. The fifth image is captured with raking light to show the texture of the folio. Texture sometimes helps us recover the writing, but more importantly comes closest (among still images) to representing what it’s like to handle the folio. The final image departs from efforts to accurately represent the artifact in favor of enhancing the text for legibility. Using data captured using light range and resolution beyond human capability, the processed image shows more of the erased text than any of the others. All six images represent the same page, and all are digital. But if we are going to talk about digitizing manuscripts and working with digital manuscripts, we need to be aware that “digital” encompasses a range of possibilities with different advantages and limitations. Any one of them may be better than nothing, and no one of them conveys all that a scholar might want to investigate.
The range of possibilities can be briefly illustrated with two other manuscripts. The most familiar distinction is spatial resolution, which determines whether an image looks blurry and pixilated. Color resolution ranges from binary black and white, which may be adequate for printed material. A single channel of color, monochrome, is a big improvement. A second channel of color is a further improvement, but still what we would call “color blind” in humans. A third channel of color defines “normal” for most of us, especially if it is calibrated for accuracy. But normal human vision is actually quite limited compared to what can be seen by other species and by spectral imaging.
If our goal is to read letters of a page, in this case erased letters on a palimpsest, our metric is fairly straightforward. Here the top images are inadequate, even up to full accurate color. The enhancements possible with spectral imaging clearly help. We also see clear benefits from the addition of raking light, which shows the texture. This is because ink slightly corrodes the surface of parchment and leaves its impact even after the ink itself is removed.
In these introductory examples I’ve already hinted at the three priorities I would call most essential when digitizing manuscripts for the benefit of research and teaching in biblical literature. The first priority is access. One of the core advantages of digital information is that it can be copied and transmitted with no loss and little cost over the Internet or other digital media. But we don’t reap the benefit without thinking about standards for interoperability, permissions, discoverability, permanence, and ease of visualization interface. The second priority is comparability to first-hand experience. What can we say to someone who says there is no substitute for first-hand experience? What exactly do we do with a manuscript when we handle it in a reading room? Can digital technology provide something like that experience, or at least help us answer the same scholarly questions? The third consideration is the value of improving on first-hand experience. Even direct access does not guarantee answers to all our questions. Spectral imaging gives us superpowers—the ability to see what the natural human eye cannot. These priorities become more complex when one imagines that a scholar might want to study something about a manuscript other than read its text beginning to end.
The first priority is access. We want scholars and students to be able to investigate manuscripts even without special connections and privilege. Digital technology makes that possible, but not automatic. We can also learn from the mistakes of a past digital projects. Many a website, once cool, will die alone in its own isolated silo, its data and formats never to be carried over into next-generation systems. There are a couple of points here, and they all revolve around the theme that we can achieve sustainable access through standards for interoperability. We’re all familiar with the fact that information is lost in human communication if language, genres, and conventions are not held in common. What’s new here is the importance of machine readability. If we follow conventions that allow machines to aggregate and index the information that will be important to scholars, most of the other considerations of access will fall into place.
One issue of access is permission. I imagine we are all familiar about debates surrounding open access. I’m more concerned with projects that consider themselves open, but undermine their own intentions with non-standard licenses. Humans can get lost reading custom legalese, and computers have no hope. When we use a standard license, such as any of the Creative Commons licenses, we’re not only making it easy for humans to know what they can do with an image. We’re making it possible for machine aggregators to know that they can incorporate metadata into their index.
If that happens, then discoverability falls into place. Discoverability is the fundamental question of does anyone know it exists? Can scholars find it? Some searches and filters are easy, but what if you are looking for fifth-century manuscripts written in two columns? What if your goal is not to read the manuscript beginning to end but to trace development of scribal practices across centuries and continents? Open standards and linked data will allow information to be used outside of the originating silo to answer questions different from those of the original designers.
Another important part of access is permanence. Will it be there when you go back to look for it again? If you copy the address bar from the browser, will that take you back to what you were looking at? Will any of it make sense to a human who might want to make a modification? In the case of images of manuscripts, the most important standard is the International Image Interoperability Framework (IIIF) Image API. This lets us store the image once on a public repository, and refer to and access any portion and scale we might want. If I want to collect examples of a particular scribal practice, such as ligatures or nomina sacra, I can store the IIIF Image URL rather than downloading, modifying, and reposting the image. The coordinates are understandable by a human with familiarity with the standard and can be easily modified to get a broader or higher-resolution image. More importantly, we always know where the image came from. Similarly the IIIF Presentation API describes codices and other large collections of images in meaningful and standard ways.
Another concern is not just will the information be there when we go back for it, but when our great-grandchildren go back for it. In the interest of time, suffice it to say that a standard does not need to last forever as long as it can be understood and converted by whatever replaces it.
The final consideration for access is the ease of use of the visualization interface. Users would rank this highly. However, if the standards for interoperability are done right this is the least of the concerns. No one viewer needs to be a permanent solution, and no one viewer needs to have the perfect balance of power and ease that is right for everyone. Once images and collections are defined with IIIF standards, users can choose between several viewers according to personal preference.
Mirador stands out as being most actively developed, so there is a greater chance that users will already be familiar with it, and a greater chance that power and ease of use will increase in the future. But if a better viewer comes along later, we won’t need to change anything about how the information is stored.
The second priority for digitally-enabled research and teaching is that digital surrogates of manuscripts should be able to help us answer the questions we could answer with first-hand experience. When people say there is no substitute for first-hand experience, they don’t mean that the text is more legible, certainly not more than a critical edition. My point is not to argue that a digital experience will ever be the same as an unmediated experience. It may be better in some ways, and worse in others, but always different. We can examine the differences and consider how many of them can be addressed. An ineffable value of unmediated experience will keep museums and libraries in business. To imagine the optimal digital surrogate, we can focus on the scholarly questions that can be answered with first-hand experience and work from there.
One way to think about this is picture what we do in a reading room. There is a lot of movement involved. We step back to grasp the overall structure, but then we move closer to examine detail. This means the spatial resolution has to be high enough to allow zoom without pixilation. We also move the object, or our heads, or the light. This is partly our way of correcting for abnormalities in the light that create artificial hues or other false artifacts. By holding a page up to the light we can see holes and scoring lines made to guide the scribe, we can see thin spots and even holes where the ink ate through the parchment. From movement we perceive specularity, how shiny the surface is. Most of all we perceive texture and depth. We track the distorted shape of a letter over distortions in the parchment. We note the hair and flesh sides of the parchment, especially when reconstruction is necessary. We look for traces where now-erased ink once corroded the surface of the parchment. We try to read dry-point notation, the hidden notes scribes and readers made to themselves that could only ever be seen from texture.
To these examples from parchment manuscripts, many more could be added when studying objects for which texture was the primary conveyor of meaning. Inscriptions, coins, and cuneiform are major examples from writing, and brush strokes and other techniques can be important in arts and crafts. A diffuse light photograph of an inscription is completely illegible. A good photographer can find the right angle of illumination to make the intended feature visible. If no one angle of light makes all the features visible, or if the photographer cannot anticipate every feature a scholar would want to examine, it becomes important to photograph multiple angles of illumination. The next step is to capture a complete set of angles of illumination. With all this information the data can be processed to record the texture and specularity of each pixel as a function of the light position. This method, called Reflectance Transformation Imaging, can extrapolate light positions and enhance the specularity of the surface.
More importantly, the image becomes interactive, as the user, even through a web browser, can move a virtual light around the object. This brings us back to our image of a scholar examining a manuscript in a reading room and the importance of movement. No one still image is adequate, and even a sum of still images falls short of experience of motion and interactivity in real time. Now one can get a feel for the condition of the folio and answer questions as basic as distinguishing a trace of ink even with the surface from a hole or accretion above the surface. For a student who might not have had the privilege of first-hand experience, it can be moving as well as informative.
The third priority picks up where the second priority leaves off. Sometimes first-hand experience is not that great, or doesn’t tell us everything we might like to know. Our memories are notoriously unreliable when we try to play back in our minds what we saw. Memory is also insufficient for persuading others. But even in the moment our ability to see what we’re looking at is remarkably limited. Our low color resolution means that browns can be hard to distinguish. This is particularly a problem when we look at brownish parchment with brownish erased ink and brownish secondary ink and brownish reagent. If the eye proves intelligent design, it also proves that the octopus is the greatest of all creatures. Even shrimp can see color better than humans. Every color we see is a combination of three colors. When we look at a rainbow we see seven bands. These bands are to color resolution what pixilation is to spatial resolution. With more receptors we would see a smooth gradient from violet to red. We call someone color blind if they have only two kinds of color receptors. Shrimp could call us color blind. More usefully, a multispectral imaging system could call a shrimp color blind, resolving fourteen discrete wavelengths.
Besides color resolution, our vision is also limited in color range. We can’t see wavelengths shorter than violet, called ultraviolet, or longer than red, called infrared. Infrared is especially good at distinguishing organic and inorganic pigments. Ultraviolet is also useful because, in addition to reflecting off parchment, it also causes materials to fluoresce. This is what makes your shoelaces glow in what appears to be a dark room with a blacklight. We’ve known about the usefulness of infrared and ultraviolet for studying manuscripts long before digital imaging. A film camera can be modified to capture a monochrome photo showing infrared as white. A black lamp in a reading room can allow the eye to see the fluorescence, though not the reflected ultraviolet.
What is new with digital multispectral imaging is that for each pixel in each capture the light coming from that spot is given a numerical value. As long as the object and camera remain motionless, many captures can be acquired and the pixel will represent the same spot on the object. Multispectral imaging today might capture fifty images under different conditions. We measure how much light passes through the object at four different wavelengths. We measure how much visible light reflects at ten wavelengths. We measure how much invisible light (ultraviolet and infrared) reflects at six wavelengths. We measure how much light fluoresces in each of six color ranges, stimulated by five wavelengths of ultraviolet. You won’t be tested on this. The point is that multispectral imaging doesn’t just take a nice picture, it captures a huge amount of digital data.
Sometimes one of the captures is useful, the way on old film infrared photo could be useful. More often it takes additional processing to find meaningful contrasts in that huge collection of data. Fortunately, this is what computers are good at. For each pixel they can look at all the numbers captured under fifty or more conditions and clearly distinguish the brown with one spectral fingerprint from the brown with another spectral fingerprint. The images produced are not intended to represent true color, but to render contrasts invisible to the human eye in contrasts that are visible to the human eye. For a complex folio there may still be more contrasts to show than can be shown at once. We can’t expect multispectral imaging to spit out one definitive image that will show everything one might wish to see. This is especially true if we think of manuscripts as more than just text containers. For this reason it is important for viewers to be able to flicker or fade between many color renderings.
These three priorities are all important to consider when thinking about what it means to digitize manuscripts in a way that will satisfy the needs to scholars today and well into the future. Multispectral data—beyond simple pictures of what the human eye can see—are necessary for erased or damaged manuscripts, or to study non-textual features of scribal culture. Texture and interactivity are necessary to provide conservators, researchers, and students with a sense of the physicality of the artifact as more than a text container. The images will not have the appropriate impact on scholarship if they cannot be accessed, discovered, and studied in a helpful viewer, both today and well into the future.
If we were to consider objects other than manuscripts, we would also have to include structure. A folio can be reasonably reduced to recto and verso with texture but not deep dimensionality on each side. For objects with many sides, or not simply reducible to sides, technologies such as laser-scanning and photogrammetry can capture truly three-dimensional structure. Full 3D imaging includes both boundary structure of an object and the texture and specularity of each surface on the structure. Thus RTI and laser-scanning are not competing technologies, but complementary aspects of optimal three-dimensionality. While work remains to be done on integrating texture imaging with structure modeling, the three priorities discussed above are all fully compatible and integrated today. With support from the National Endowment for the Humanities, my own project developed and made public tools for combining reflectance transformation imaging and multispectral imaging. The software also outputs formats conducive to publication using IIIF standards for interoperability and WebRTI for interactivity in any web browser.
One last consideration I should mention is cost. To date, mostly known high-value and mysterious objects have been imaged with the full set of multispectral and texture imaging. The only real solution to the problem of cost, in my opinion, is to increase efficiency and economy of scale. Once spectral imaging moves from a niche to the standard for conservation-quality imaging, the cost will decrease dramatically. You might think that many objects do not need texture imaging or do not need multispectral imaging, but that list shrinks if you imagine that other researchers may bring different questions to the artifact, particularly questions other than reading the main text. In many cases we don’t know what we can’t see until we try advanced imaging. Dry-point notation and erased text have appeared where not expected. One might say that anything is better than nothing, but it could do harm if scholarship is done using a digital surrogate that fails to provide the necessary information. Work is also being done on inexpensive options that may not match the quality of a high-end system, but could be used to test many objects to determine candidates for further imaging.
Manuscripts and the literature they preserve are only one aspect of a complete study of the ancient world. I do not believe, however, that the literary record and study of scribal cultures has been exhausted by the critical editions and digitized collections available today. The study of the most primary of primary sources in biblical literature, the manuscripts themselves, is very different today than twenty years ago. I expect, or at least hope, that it will be even more different twenty years from now. Thank you.