Grant Number HK-250616-16
The Jubilees Palimpsest Project:
Spectral RTI Technology for the Recovery of Erased Manuscripts from Antiquity
Todd R. Hanneken, Ph.D.
St. Mary’s University
November 30, 2019
Table of Contents
The Spectral RTI Toolkit works with ImageJ to process Spectral (narrowband) and RTI (hemisphere) captures into raking, RTI, and WebRTI files.
Project activities cover five areas.
First, the alpha-version of the Toolkit was published on GitHub (https://github.com/thanneken/SpectralRTI_Toolkit).
Second, the Saint Louis University Center for Digital Humanities
(https://www.slu.edu/arts-and-sciences/ong-center/)
was contracted to develop the toolkit from its earliest state as an ImageJ macro into a Java plugin for ImageJ2.
Third, the Toolkit is being maintained and improved in light of the needs of users within and beyond the project.
Fourth, documentation has been published on the project website
(https://jubilees.stmarytx.edu/spectralrtiguide/),
along with GitHub to facilitate derivatives and contributions from others,
(https://github.com
We worked hard and achieved success beyond the proposed activities in designing and building an arc to automate Spectral RTI captures. Capturing data for RTI images requires a real or virtual dome of discrete lights around the object. This can be done with a handheld flash, which requires manual positioning for each of fifty or so captures, and requires the light positions to be calculated from a reflective hemisphere for each page. It can also be done with a dome with lights fixed to known positions, but the diameter of the dome must be several times the diameter of the page, which is about eight feet for manuscripts. Such domes are unwieldy and interfere with safe object handling.
Through a series of conference calls and models (physical and Computer Aided Design, see appendix), our team designed an arc that has the major advantages of a dome but takes less space and can move out of the way. The arc pivots on the light stands already used by spectral imaging. The arc slots into seven positions that do not change from one sequence to the next. The arc holds sixteen lights, of which the odd and even numbered lights fire on alternating arc positions. The result is that fifty-six images capture the reflectance of the object when illuminated by evenly-distributed positions around a virtual hemisphere. The time required for hemisphere captures for RTI decreased from almost twenty minutes in the startup phase to less than five, while increasing the number of captures from thirty-five to fifty-six. The greater number of hemisphere captures increases texture resolution and decreases the impact of shots corrupted by shadows from the camera stand.
The team traveled to Milan and captured complete Spectral RTI data for each of the 144 pages of the Jubilees Palimpsest, plus early modern notes archived with the Jubilees Palimpsest and samples from five additional palimpsests in the Ambrosiana. The team consisted of the seven proposed participants and benefitted from additional volunteer effort from team members extending their time commitment and additional partners assisting at their own expense. The proposed team members were Todd Hanneken (project director), Anthony Selvanathan (graduate researcher from St. Mary’s University), Michael Phelps, Damianos Kasotakis, Roger Easton, Keith Knox, and Ken Boydston. Additional volunteers were Dale Stewart and Giulia Rossetto.
The travel to Milan originally scheduled for March 2017 was moved ahead to January 2017. This saved money and increased time availability of team members on site. We were able to rent three apartments in the same building, which worked very well. The favorable exchange rate helped us stay well within budget.
The narrowband spectral captures were increased to fifty-two captures per page in response to particular properties of the chemical reagent that was used early in the nineteenth century. The narrowband captures included fourteen bands of narrowband reflectance from ultraviolet to infrared, four bands of transmissive illumination, and a total of thirty-four fluorescence captures. The fluorescence captures included four different wavelengths of illumination and seven different filters plus additional variants at different exposure settings when the chemical reagent caused regions to differ radically in reflectance. The hemisphere captures for RTI amounted to fifty-six images per page. A total of 108 images were captured for all 144-pages of the Jubilees Palimpsest in just less than three of the four weeks in Milan.
With the remaining time we imaged the forty-six pages of non-palimpsest front matter and early modern notes archived with the Jubilees Palimpsest. Because these pages pose no challenges to legibility, we used a reduced thoroughness (but still super archival quality) of sixteen images per page. We also sampled pages from other palimpsests in the Ambrosiana collection to aid demonstration of the utility of Spectral RTI and to probe the potential for future advanced imaging projects at the Ambrosiana. The objects selected were: an illumination from Petrarch’s Vergil that includes a crypto-script signature illegible to the human eye (A79 inf), an unidentified Greek commentary on the Gospel of Luke (F130sup), a palimpsest with several unidentified undertexts (H190inf), Origen of Alexandria’s edition of versions of the book of Psalms (Hexapla, O39sup), and Wulfila’s fourth-century translation of the Epistles of Paul into Gothic, including a liturgical calendar (S36sup). The objects were selected to appeal to a broad range of scholarly, popular, and political constituencies.
In total we captured 239 pages, mostly at a rate of 108 captures per page, 50 megapixels per capture, 16 bits per pixel. Capture and on-site processing generated seven terabytes of data in Milan.
See below for additional data captured as part of the training workshops.
The data generated was archived for accessibility, functionality, and clarity for the immediate team and for posterity. For each capture, three formats are archived. First, the raw data from the camera in digital negative (dng) format was immediately set to read-only and archived for posterity should any of our subsequent processing decisions be questioned. Second, the data was “flattened” (corrected for aberrations in lighting based on a plain white calibration target). This data is most useful to the scientists for processing. Third, the flattened data was gamma-corrected to match the perception bias of the human eye. These gamma-corrected images are necessary for processing designed for human consumption. This data is somewhat redundant in that the later could be rederived from the former. We are considering ways to reduce this redundancy without sacrificing accessibility. The question is how easily, consistently, and reliably posterity will be able to rederive the derived data. In the meantime, all three are considered archival, along with the calibration captures.
Extensive capture metadata is encoded into the EXIF headers of the captured images. We supplement this metadata with an XML file for each page that includes all the EXIF metadata for each image in the sequence, while grouping together data that is constant for all shots in the session or image sequence. Additionally, illuminator sequence codes meaningful to the team may not be meaningful to posterity so they are elaborated in companion tags using a namespace specific to spectral imaging.
Data preservation and integrity was preserved at various levels. First (and most often overlooked) we countered the threat of “bit rot” by using checksums on the file system level and redundant file system metadata by using the B-Tree File System (BTRFS). Checksums are also used in verifications and duplications using rsync. Second, we countered the threat of drive failure by using RAID 1 or 10 redundancy in the definitive archives and backups. Third, we countered the threat of losing an entire computer or piece of luggage by distributing backups across locations.
The archival data is publicly available for specialists apart from the IIIF image repository described below, which serves a much wider audience. The data archive is available at https://palimpsest.stmarytx.edu/AmbrosianaArchive. Like all grant products, the data is accessible without any kind of encumbrance (e.g., account creation, cookie stalking) under a Creative Commons license (CC BY-SA for everything created solely by the Jubilees Palimpsest Project and CC BY-NC-SA for objects owned by the Biblioteca Ambrosiana).
Additional data collected as part of the 2019 training workshops was managed and published at https://palimpsest.stmarytx.edu/WorkshopsArchive/.
Data processing can be grouped into two end goals. The first is to create a digital facsimile that captures the present state of the artifact as accurately as possible. This kind of accuracy is useful to students and scholars who do not have first-hand access to the artifact, and to future conservators and scholars who will not otherwise have precise information on the state of the artifact in 2017. Accurate digitization of first-hand experience is done with high-resolution color using ten wavelengths within the visible spectrum. From this data accurate color images were created in the LAB (preferable for archival quality) and sRGB (preferable for compatibility and accessibility) color spaces. These derivative files have 24-bit color depth. Accurate spatial resolution is achieved by avoiding Bayer or other filters, and by using an apochromatic lens. Accuracy in texture is achieved by using transmissive light (which can simulate holding the page up to a light) and capturing reflectance of light originating from different angles (raking light images and eventually RTI, which can simulate moving a light around the object).
The second major end goal is to surpass first-hand experience for reading illegible text, marginalia, and other features.
Some of these follow standard recipes and some involve case-by-case labor.
The standard recipe included with the Spectral RTI Toolkit is Extended Spectrum, which essentially squeezes ultraviolet and infrared into the visible spectrum and optimizes contrast.
Another standard recipe was created by imaging scientist Keith Knox to deal with the particular problems of the reagent-saturated palimpsest.
This method, called RuBY, takes its name from the formula of taking Royal blue fluorescence divided BY transmissive.
It has proven effective at reading illegible text in the palimpsest.
Two additional recipes developed by Knox, Sharpie and Pseudocolor, were applied to the palimpsest samples other than C73inf.
All of the processes described thus far (Accurate Color, Extended Spectrum, Ruby, Sharpie, and Pseudocolor with raking and transmissive light variants and WebRTI) have been completed and published for all pages captured.
Additional supervised processing has been completed for the supplemental palimpsests and representative samples from C73inf.
So far no one procedure has proven sufficiently efficient or effective to apply to all pages of C73inf.
Innovations in advanced processing techniques require a feedback-loop between scholars and scientists.
The chief scholar Todd Hanneken and the scientists Keith Knox and Roger Easton conducted weekly conference calls for more than a year following the capture session,
and occasionally thereafter.
The processing guides created through this collaboration are archived, publicly accessible, and discoverable through search engines:
https://palimpsest.stmarytx.edu
Additional efforts at advanced processing have explored non-linear transformations, including Kernel Principal Component Analysis (KPCA) and Laplacian Eigenmaps (LE). As of the end of the award period, no clear and efficient solution has been found. Efforts will continue.
Together with the Department of Network Services at St. Mary’s University, the project director created a IIIF image repository on an Amazon Web Services EC2 instance with elasticity, Amazon S3 backup storage, Amazon CloudFront international caching, and Domain Name Service for https://jubilees.stmarytx.edu. As described in the proposal, this arrangement is ideal for the predominantly off-campus traffic of the project and the potential need for elasticity if usage spikes with media coverage.
The project director tested open source alternatives for the Jpeg 2000 backend of the IIP image server.
Unfortunately, quality, performance and reliability were acceptable only with the commercial alternative (Kakadu), which is the one thorn in the side of an otherwise entirely open-source project.
Once the IIP image server was compiled with the Kakadu Jpeg 2000 libraries and the Apache configuration adjusted, the IIIF Image API compliance was ready.
The IIIF Image API allows project images to be stored once and served in portions at various resolutions.
This is essential, for example, for the paleography chart of Latin Moses
(https://jubilees.stmarytx.edu
IIIF Presentation API manifests were written with placeholder data in advance of the capture session and filled in as data was created. This allowed many images to go live before the capture session was complete. One challenge encountered with the image repository was finding, or building, a viewer that supports the Image Choice (layers) specification of the IIIF Presentation API. This feature is essential for spectral imaging, which creates many perfectly registered images for each page. Even though Mirador is specifically designed for IIIF, version 2.0 did not support Image Choice when the repository was otherwise ready. The long-term solution was to wait for Mirador 2.6.1. The short-term solution was to build a custom viewer using JQuery and Leaflet. This viewer is called IIIF Navigator (https://jubilees.stmarytx.edu/iiifp/). It is less powerful than Mirador, especially for creating annotations, but has some advantages in its simplicity. It shows all the information available in a IIIF Presentation manifest in a linear view. That is, the user can see what images and other resources are available just by scrolling down without hovering, clicking, and exploring. The additional resources include transcriptions, translations, and WebRTI images. It also provides direct links for cropping tools and other information essential for utilizing the features of the IIIF Image API for more than just serving tiles.
Ranges of chapters and verses witnessed on each page were added to the IIIF Presentation manifest for Latin Moses, which facilitates browsing in the Index tab in Mirador.
A public annotation server was connected to Mirador, which allows users to contribute annotations to be seen and reviewed by others. These annotations are most often transcriptions, but can also note areas or points of interest, such as marginalia or other scribal practices. See below for the project activity “Annotate the manuscript with line-by-line transcriptions.”
Mirador was customized to show manifest (manuscript) and canvas (page) coordinates in the address bar. This allows scholars to copy and paste from the address bar in their browsers into any medium (such as an article or annotation) and direct others to the exact page in the viewer.
A fully-tagged machine and human readable version of the 1861 edition of Latin Moses facilitates study of the manuscript and will serve as the foundation for a new critical edition of the manuscript (https://jubilees.stmarytx.edu/annotations/Ceriani_1861.xml). EpiDoc standard tags were used to code unclear characters, fully illegible characters, line and column breaks, chapter numbers, as well as verse numbers and emendations offered by subsequent generations of scholars. The XML edition preserves all available information and can be viewed is customized ways, such as showing the best available scholarly improved text, or the most faithful transcription of the manuscript, or both. Eventually, the past readings and emendations will be combined with new ones to create a complete critical edition.
Student researchers annotated each line of Latin Moses with the transcription proposed in 1861, their own transcriptions, the certainty or readability of the line, and their own initials. Scribal features such as Nomina Sacra are also tagged. The annotations can be viewed in Mirador or a searchable index of all annotations (https://jubilees.stmarytx.edu/annotations/summarizeannotations.html). All ninety-six pages of Latin Moses (Jubilees and the Testament of Moses) were annotated for two columns and twenty-four lines per column, for a total of roughly 4608 annotations of transcription, not counting other features.
The two major categories of training and mentoring were the student researchers at St. Mary’s University and other imaging students and professionals. In academic year 2016-2017 Anthony Selvanathan was trained by the project director in various aspects of the project, and joined the team for five weeks in Milan. There he learned and was actively involved in all aspects of the project, especially manuscript handing and mounting for imaging and operating the image capture equipment.
In academic year 2017-2018 the student research opportunities were opened up to all students on the campus of St. Mary’s University.
Approximately thirty students were exposed to the project at least at the level of an information session, and twelve continued on to paid work.
The training portion consisted of regular workshops with the project director every other week for two-hours.
This training supported independent work for the project on the students’ own schedules.
The training included surveys of the general context of the project (the book of Jubilees, manuscript studies, spectral imaging).
Specific skills trained and utilized were coding manuscripts in EpiDoc TEI XML, creating paleography charts using the IIIF Image API, and transcribing manuscripts using the annotation features of the IIIF repository and Mirador.
The instructions created for the student researchers are included in the public archive of the project
(https://palimpsest.stmarytx.edu
In academic year 2018–2019 three new student researchers were trained to complete the line-by-line annotation of the erased text of Latin Moses.
Following capture, processing, and publication in the first project year, the general theme of the second project year turned to building awareness and fostering engagement in the scholarly community. The website was enhanced with more scholarly resources, such as a codicological reconstruction of Latin Moses with links to the viewers. The highlight was a scholarly conference at the University of Notre Dame, May 15–18, 2018. As proposed, the speakers were Todd Hanneken (the project director), James C. VanderKam, and Annette Yoshiko Reed. The conference was titled “Recent Developments and the Future of Scholarship and Teaching Ancient Scribal Heritage.” The major papers by the speakers listed above were, respectively, “The Next Generation of Digital Tools for the Study of Manuscripts,” “The Transmission and Reception of the Book of Jubilees,” and “Palimpsests, Remembering, and Forgetting.”
The flyer for the conference is included in the appendix.
The webpage created to promote the conference is
https://jubilees.stmarytx.edu
Following informal individual consultations (Kathryn Piquette, Sarah Baribeau, Damianos Kasotakis),
the first formal training workshop took place at the lab of the Lazarus Project at the University of Rochester.
It was a pre-conference addition to the Rochester Cultural Heritage Imaging, Visualization, and Education (R-CHIVE) Conference.
Twitter, email, and the page on the project website
(https://jubilees.stmarytx.edu
The second major training event took place at the University of Texas, Austin in conjunction with the Harry Ransom Center
(https://jubilees.stmarytx.edu
The third major training event took place at the British Library.
The same format described above was followed, with a new page on the website for this and the following workshops,
https://jubilees.stmarytx.edu
The fourth major training event took place at University College London. Objects imaged included wax seals, erased paper, overwritten paper, oil on canvas painting, and a woodblock. The woodblock was especially interesting because it benefitted from RTI as expected but also yielded surprising information from spectral data processing. Although the block appeared uniformly black, it must have been used with inks of different spectral signatures that penetrated the block to different degrees. The object exemplifies how spectral and RTI can show complementary features.
The fifth major training event took place the Royal Library of Denmark.
Objects imaged included sheets of paper glued together with writing in between, an embossed leather book cover, and cutouts by Hans Christian Andersen.
The cutouts exemplified the ability of Spectral RTI to show texture, the conservation issues with glued paper, and highly accurate color.
They became the subject of a six-minute video walkthrough of the features in Mirador,
https://jubilees.stmarytx.edu
The sixth major training event took place at the University of Graz, Austria. Objects imaged included damaged early modern musical notation. The primary showcase of palimpsests from the Biblioteca Ambrosiana fulfilled the project title, “erased manuscripts from antiquity,” but the training workshops illustrated the benefit for many other objects.
Activities to support media coverage surged after the NEH announcement in August 2016. Coverage is listed below under Accomplishments. This coverage and web searches led to scholars contacting the project director with various requests, all of which were addressed. The scholars’ workshop, conference presentations, and public lectures are listed below under Accomplishments.
During the “European Tour” of training workshops in the summer of 2019, the project director (Todd Hanneken) presented at two conferences.
The Society of Biblical Literature Annual Meeting in Boston, November 2017, included a session titled, “Multi-spectral Imaging and the Recovery of ‘Lost’ Texts from Palimpsests.” It was a joint session of the Pseudepigrapha section and Digital Humanities section. Four team members presented (along with two others):
The Rochester Cultural Heritage Imaging, Visualization, and Education (R-CHIVE) consortium held a conference June 19-20, 2017 (with additional meetings before and after the conference). The following team members presented:
The project director (Todd Hanneken) gave public presentations on the project.
Student researchers presented on their work for the project.
A scholars’ workshop took place May 15-17, 2018 at the University of Notre Dame.
The audiences served can be grouped into three categories: 1) scholars of the ancient literature being recovered; 2) digital humanists, conservators, and imaging professionals interested in the capture and processing technology, or similarly the publication technology; and 3) general interest and popular media.
The 2018 Conference at Notre Dame with the project director, James C. VanderKam, and Annette Y. Reed was the centerpiece of scholarly activity. Because the project director’s primary scholarly community consists of scholars of ancient literature, word spread quickly among this audience. The project was discussed in the 2018 Hermeneia Commentary on the Book of Jubilees. The entry in the Brill Textual History of the Bible on Latin Jubilees written by the project director depends upon and highlights the project. In addition to the text of Latin Jubilees, scholarly interest in the manuscript as more than a text container has been significant. For example, Cypert’s presentation at the International Meeting of the Catholic Biblical Association focused on the theology of the scribe copying the text, not the intent of the original author of the text being copied.
Other texts sampled at the Ambrosiana also attracted interest. One of our images was featured on the cover of Alexey Eliyahu Yuditsky, A Grammar of the Hebrew of Origen’s Transcriptions. Israel: The Academy of the Hebrew Language (2017). Benjamin Kantor, a scholar at the University of Cambridge, found the pages sampled highly valuable and is working to image more pages. The community of scholars working on early Gothic is smaller, but David Landau and Hugo Mendez both showed interest in doing research based on the images we published.
Similarly, word about our technologies for capture, processing, and publication spread through professional networks and the Internet. Gregory Heyworth of the Lazarus Project at the University of Rochester was the first to purchase a MegaVision arc for Spectral RTI. Giulia Rossetto is an imaging specialist who worked on the Sinai Palimpsests Project and donated some of her time to assist our project in Milan. Kathryn Piquette is an RTI specialist expanding into spectral using the PhaseOne system. The project director demonstrated the project to the monthly community video conference of the Manuscript Group of the IIIF community. He also presented on texture imaging at the first annual conference of Rochester Cultural Heritage Imaging, Visualization, and Education (R-CHIVE) and as a special guest of the Carlson Center for Imaging Science at the Rochester Institute of Technology (see above, Presentations). The training workshops were all filled to capacity and very well received. The project is discussed by Bill Endres in Digitizing Medieval Manuscripts: The St. Chad Gospels, Materiality, Recoveries, and Representation in 2D & 3D. ARC Medieval Media Cultures. Amsterdam: Amsterdam University Press (2019) p. 45. The presentation at Digital Humanities 2019 Utrecht was very well attended, well received, and widely tweeted.
There has been significant popular media interest in the project. See the publications noted above, especially National Geographic and Biblical Archaeology Review. The project director consulted on the narrator’s transcript for the episode of PBS NOVA “Dead Sea Detectives” that first aired November 6, 2019.
We measure our success by our ability to answer the following questions.
Yes. Beyond the limited tests from the startup phase we demonstrated that the technique is feasible in the capture phase and effective in the processing and publication phase. The equipment problems we did encounter had nothing to do with the addition of RTI. We were able to conduct the capture at a steady pace consistent with other spectral imaging projects. The technology works efficiently and consistently combines the advantages of spectral imaging with the advantages of RTI. The limitations are only those found in the component technologies.
Yes. By using the MegaVision RTI arc we can conduct Spectral RTI in less time than it takes to do RTI alone using the hand-held flash method. The RTI sequence of 56 captures takes 4-5 minutes. The sustained rate of capture (including object mounting, 52 spectral captures, 56 RTI captures, breaks, occasional trouble shooting, visitor interruptions) averages 20 minutes per page. The arc also saves about five to ten minutes of processing time compared to determining light positions of a handheld flash from a reflective hemisphere.
Yes. Participants at the training workshops were able to install and run the software themselves. The software and documentation were adjusted to fix the most frequently encountered difficulties. We fully expect that only moderate patience would be required of a professional attempting to perform the capture and processing without having been trained in person. Technicians using non-MegaVision imaging systems were able to complete the capture and processing. We know the technology has been adopted by Gregory Heyworth of the Lazarus Project at the University of Rochester.
IIIF provides the core open standards for making the images accessible in the most useful way. One challenge encountered is that the “image choice” feature of the IIIF Presentation API is small enough of a niche that it was not supported in Mirador versions 2.0-2.5, or in the beta version of Mirador 3. The project served as a demonstration of the value of the “image choice” (or layers) feature, which we expect to be available before Mirador 3.0 is released. We also created video tutorials for new users of Mirador to find all the features available for study of the Jubilees Palimpsest. We are not using cookies or other methods of tracking users of Mirador and the IIIF image repository. Google Analytics shows that the project main page receives unique visitors in the range of low triple digits (>100) per month.
This is the most interesting question to the image processing scientists. Progress on this test case would have broad application. We made decent progress with a new technique called Ruby (Royal blUe fluorescence divided BY transmissive). We also have an understanding of why it is so difficult (because the undertext, overtext, and reagent are all made of the same iron gall ingredients). We did not find a slam-dunk universal solution. Efforts to use non-linear transformations shows some promise, although at a very high level of difficulty and processing power. See further, “Continuation of the Project.”
This is the most interesting question to the scholars of Jubilees and the Testament of Moses. The most sensational achievement would be to add new readings where Ceriani (the 1861 editor) did not venture a guess. This seems to be rarely the case, and no more than a few letters at a time. More often we can correct, or at least call into question, the reading Ceriani claimed to see. As we accumulate evidence that Ceriani was a “loose” editor, every reading he proposed comes into question and will be subject to additional scrutiny. Very often we can identify editorial “corrections” into more standard Latin or expansion of abbreviations. This information will not tell us about the original composition of which the palimpsest is a copy, but it can tell us about the context of the translator or copyist. See further, “Continuation of the Project.”
The project was deeply interdisciplinary and collaborative, and as such opened many doors for future growth. Among those are:
Even if none of the items under “Continuation of the Project” move forward, the resources created thus far will have a long term impact. The data collected in Milan and elsewhere has been managed and archived so that future imaging teams will be able to study our data. St. Mary’s University has committed to maintaining the IIIF image repository in perpetuity, such that the images, manifests, and annotations created thus far will remain as a resource to scholars. Even without further development, the SpectralRTI_Toolkit Java plugin for ImageJ 2 will remain usable for as long as ImageJ maintains compatibility. The technologies that eventually replace the technologies we have used and built will presumably benefit from the challenges we faced and progress we pioneered.
See above.
The design with two low pivots was built:
The “carousel” design could also be useful in permanent facilities with adequate space and overhead structure.
https://jubilees.stmarytx.edu/spectralrtiguide/
https://jubilees.stmarytx.edu/thanneken/2019/DH2019_Utrecht-paper.html
https://jubilees.stmarytx.edu/thanneken/2019/ISBL_Rome-paper.html
https://jubilees.stmarytx.edu/2018/201805Workshop.html
https://github.com/thanneken/SpectralRTI_Toolkit