Todd R. Hanneken, St. Mary’s University
Table of Contents
1. Schedule
2. Text
2.1. Introduction
2.2. Why Integrate TEI and IIIF?
2.3. How Integrate TEI and IIIF?
2.4. Why Streamline?
2.5. Conclusions
3. Abstract
4. Bibliography
Conference Website dh2024.adho.org
Short Paper (10 minutes +8 discussion) in session 4:00-5:30pm, 8/8/2024, Van Metre Hall 308
Project website: palimpsest.stmarytx.edu
Personal webpage: palimpsest.stmarytx.edu/thanneken
Email: thanneken@stmarytx.edu
Mastodon: @thanneken@hcommons.social
TEI 2 Edition software on GitHub
Thank you to the organizers and all of you for being present. I look forward to the discussion and questions. Let me begin by introducing myself, my resources, my project, and my goals as relevant for the ideas I describe below. I teach theology at a teaching-focused small liberal arts college in San Antonio, Texas. My formal training is in early Jewish literature, especially the books preserved among the Dead Sea Scrolls. I could be described as curious about technology, but I do not have formal training in computer languages. My University does not have a Center for Digital Humanities or any advanced technical support. My research schedule is irregular around teaching and service, so my collaborations tend to be informal with colleagues of similar technical ability. I can see why large, well-funded teams would make different decisions, but that is not my situation.
My project is the Jubilees Palimpsest Project. The Book of Jubilees is attributed to Moses and preserved partially among the Dead Sea Scrolls. The next earliest witness is a Latin Translation copied in the fifth century and erased in the eighth century to copy a new text. The erased text is only partially legible today with the human eye, so I have relied on Multispectral Imaging to improve legibility. Legibility remains challenging and often ambiguous. The current stage of the project is to produce scholarly editions tailored to print and digital media.
The print edition will foreground the edited text and relegate scribal oddities to the apparatus. It will favor readability with modern judgments of concepts such as paragraph divisions. While I recognize the place for such an edition, my own heart is in steering the reader to facsimile images and a hypertext edition that foregrounds the manuscript as an object of scribal culture, not just a text container. On a practical level, I would also like readers to be able to check the certainty of our readings given the state of preservation of the manuscript.
The complexity of the facsimile images demands a powerful viewer such as Mirador. It is essential to be able to zoom in and out of high resolution images and switch layers between various color and texture visualizations.
It is also important for the reader to be able to navigate by chapter and verse, or the fifth-century page sequence, or the modern page numbers.
Because legibility is often difficult, it is helpful to the reader to see annotations for each line with our proposed reading.
We also want those annotations to link to a hypertext edition that presents readable digital text in the same column and line format as the manuscript.1 The hypertext edition uses hover, click, and apparatus to help the reader while foregrounding the witness of the manuscript.
1 Abbreviations and Nomina Sacra are visualized as seen in the manuscript, with hover text helping the reader with the expansion.
Unsurprisingly, my use case led me to the Text Encoding Initiative (TEI). The advantages of TEI for the production of critical editions has been thoroughly studied. The modules on manuscript description and critical apparatus prompt us to think about our categories and information we should be including. We can describe the text on an abstract level of semantics and defer questions of visualization that may differ from publisher to publisher. LaTeX modules produce beautiful pre-print renditions with apparatus and facing page text and translation. All the desiderata for the hypertext edition can be encoded in the same source and different visualizations can render or ignore as desired.
Similarly, my goals for viewing facsimile images and annotations led me to Mirador and the IIIF Presentation API. At this point I could have moved forward with two technologies for different goals of the project. Today I would like to say more about why I integrated TEI and IIIF, how I did it (facing what challenges), and why I favored a streamlined approach.
When I say I integrated TEI and IIIF, I mean I created a workflow that puts TEI at the head and derives everything else from that one source. For each visualization there is a different eXtensible Stylesheet Language Transformation (XSLT) that renders the different visualizations (although there are other ways of doing it). One of the main advantages of the single-authority model for us is that we can make changes in one file that will automatically cascade to the print edition, hypertext edition, and IIIF Annotations. Our edition is a work in progress.
It is also a benefit to do the editing work in XML and not have to use JSON on a regular basis. I often think about what it is reasonable for my students and colleagues to learn. It is easier to learn one language than two, and of the two I think XML is much more human-readable. At any rate, it is possible to code all the information in XML and derive JSON, but not vice versa. It seems to me that whether one loves or hates TEI correlates with whether one is able to embrace thinking semantically about texts. Personally, I have found this to be a fun and thought-provoking challenge. I can say more about my experience with this challenge in the discussion if there is interest.2
2 A deeper challenge is in how we think about texts. It seems to me that whether one loves or hates TEI correlates with whether one is able to embrace thinking semantically about texts. Personally, I have found this to be a fun and thought-provoking challenge. The physical description of a manifestation of that text is separate and secondary. Coming from Biblical Studies, I thought of biblical books as divided into a hierarchy of book, chapter, and verse. TEI taught me to think of those secondary markers as milestones, that were added centuries later by people who did not necessarily understand the literary structure. Similarly, a manifestation of a text in a codex divided into page, column, and line is not the essential nature of the text. The XML necessarily follows a single hierarchy, which in literary semantics might be division-paragraph-sentence, or division-stanza-line. It is true that the semantic hierarchy creates challenges when one does wish to visualize the physical manifestation of a text in pages, columns, and lines. Those features can be coded with non-hierarchical elements and rendered as one wishes by way of XSLT.
If we can agree that TEI XML is the most capable format for a single-authority workflow
and that a humanist can be reasonably expected to use it, the next question becomes
how to derive IIIF from TEI.
Some parts of this are surprisingly easy, while others might be surprisingly challenging.
The part I found surprisingly easy is that the TEI Guidelines already include fundamentally
equivalent categories under different names.
Almost all one would want for a IIIF Presentation Manifest falls under teiHeader
or facsimile
.
Under facsimile
, TEI calls a surface
what IIIF calls a canvas
.
Each surface
can be described by many graphic
elements, which is essential for the layers functionality.
All the annotations I would want to link to the Presentation can be expressed with
the elements line
and media
.
Because I want to draw from the text transcription for the annotations, I use the
xml:id
, facs
, and corresp
attributes to link text and facsimile.
Conceptually we’re off to a good start in drawing from the TEI Guidelines to populate
a IIIF Presentation manifest without any loose interpretations of the guidelines.
So the next question is how to translate the TEI concepts to the IIIF concepts. The main categories are the “homecooked” options and the platforms and virtual research environments I will address following. By “homecooked” I mean relatively short bits of code that a humanist or collaborator might adapt for a specific project, without trying to anticipate every possible use case. I could say more about Python in the discussion if there is interest. Overall, I would say that XSLT is more approachable for my students and the average humanist. It is particularly elegant for rendering HTML, and plain text and JSON output is also available. XSLT is sufficiently capable of translating a single TEI XML authority into quite a few visualizations. My code, which is available on GitHub, has one transformation for PDF by way of Latex, another for the hypertext edition, another for the Presentation Manifest, and another for the Open Annotations in Mirador.
3 An example of a powerful but intimidating implementation of XSLT to TEI is the TEI Consortium’s Stylesheets.
The alternative to the homecooked approach is the approach of building a platform or research environment. The homecooked approach modifies simple recipes as needed for a specific project. The platform approach tries to anticipate the needs of many projects. The homecooked approach uses the simplest tool possible with the hope that a humanist can tinker directly with simple code. The platform approach requires a clear separation between a complex backend and a graphical user interface. I very much appreciate that different approaches may be suitable for different people and different projects. My preference for the homecooked approach, what I think of as streamlining, is connected to my broader interests in minimal computing. As I said last year at the meeting of the Association for Computers and the Humanities, as a teacher I am less concerned with what is hard or easy to learn, but what is most worthwhile to learn. The tools and skills most worthwhile are those that are reusable. Learning how to use a text editor, TEI XML, and XSLT has been very useful to me on many projects. Learning a new platform strikes me as much less reusable, even if it does accomplish the immediate task. In my experience, the platforms that try to anticipate every use case do not succeed.4 Customization is either impossible or more complicated than it would have been to modify a simple recipe. I praise open source software on all scales, but not all open-source software is open to me to adapt using the resources available to me.5 Taking a minimal computing, streamlined approach has benefits in control, sustainability, and accessibility.
4 For example, one major platform that boasts support for TEI and IIIF makes the assumption that a facsimile of a page is a graphic. That does not work for me because many graphics contained within a surface element are necessary to represent a page.
5 Marjorie Burghart’s TEI Critical Apparatus Toolbox exemplifies how a project can both have ambition to serve users without exposing code and also make code available for customization. While the web interface she developed does not suffice for my needs, I borrowed greatly from her XSLT code for transforming TEI to Latex.
I will close with reiterating three conclusions I have drawn from my journey. First, TEI is my friend. I believe its complexity is justified and most humanists should be able to learn the portion of it that applies to their work. However, it is important to embrace the fundamental concept of a semantic hierarchy.
Second, in order to fully embrace the distinction between semantic description and visualization, it is helpful to learn XSLT at least enough to modify the visualizations of others.
Third, I believe my use case illustrates the value of minimal computing. A habit of seeking the most reusable tool for the job gave me control to apply a concept to my use case, a use case that I would not have expected others to anticipate. I completely understand that other projects may have different resources and needs. I hope others will be able to learn from my journey.
Thank you.
The utility of the Text Encoding Initiative Guidelines for encoding texts suitable for production of critical editions is well established (Burghart 2017). In the past, those have been substantially static editions in print or online (Driscoll and Pierazzo 2016). Meanwhile, the dominant set of standards for presenting images of manuscripts is the International Image Interoperability Framework (IIIF). The most popular viewer based on IIIF is Mirador. With the essential information encoded according to IIIF standards, Mirador can display images of manuscripts with multiple image layers per page and annotations of transcriptions and descriptions. There are practical and theoretical challenges in producing an edition with all the advantages of each. For one, they use different languages. TEI uses eXtensible Markup Language (XML), while IIIF uses JavaScript Object Notation (JSON). The deeper difference is in how a text is conceptualized and structured. TEI, properly used, asks us to think of texts as abstractions with divisions, paragraphs, sentences, and words. IIIF asks us to think about a manuscript as a sequence of canvases of pages. Annotations are mapped to x and y coordinates on those pages. These challenges can, however, be surmounted. In fact, TEI can encode all the information that would be necessary to derive a IIIF Presentation manifest and annotations for each line of a manuscript. There are several benefits of centralizing work on a manuscript in TEI and automatically deriving IIIF information. First, a scholar is asked to master only one language and one set of guidelines. TEI is not necessarily easy to master, but the guidelines are consistent and all in one place. Second, there is no duplication of information pertinent both to critical editions and an interactive manuscript viewer. For example, the transcription of the text is only edited once. We avoid conflicts that arise when one file is updated and not another. Third, just as the complexity of human-edited source files is reduced, so too the complexity of project team can be reduced. It is not at all implausible that a single scholar could learn the necessary TEI standards and have little or no need for technical collaborators. Collaboration of large teams certainly has advantages for projects funded by government grants and large institutions. A simpler model, however, is accessible to individual scholars with little or no funding (Pierazzo 2019). In that way, the streamlined minimal computing approach described here addresses the conference theme of “addressing global inequities in access to technology.”
Specifically, the streamlining of TEI, IIIF, and Mirador will be discussed with reference to the Jubilees Palimpsest Project and the NEH-funded production of a scholarly edition. The text being encoded and presented is a palimpsest recovered with multispectral imaging technology. For that reason, there is no one all-sufficient facsimile image of a page that could be printed as a plate in a print edition. IIIF has the ability to include many image layers on a single canvas, and Mirador has tools for showing those layers. Additional layers with raking illumination illustrate the texture of the image, including evidence of the production and condition of the folios. Even with multispectral imaging technology, the erased text is not so legible that a reader would not need editorial guidance on region and proposed transcription. In the past, a manuscript too difficult for a facsimile edition would be presented only as an edited text. The reader would be asked to accept the editor’s authority for crucial readings. The reader would have no experience of the manuscript as anything other than a text container. The physical reality of the scribal cultures that produced and used a manuscript would be lost in transmission. At the same time, a digital experience of the manuscript without the expertise of editors would not suffice. All the information traditionally encoded in TEI for the production of a static critical edition can stand beside the additional information necessary to visualize the manuscript facsimile in folios, columns, and lines of text. The paper will demonstrate the proper TEI elements for encoding the information necessary for IIIF, including facsimile, graphic, line, zone, page beginning, line beginning, and various milestones. Examples from the Jubilees Palimpsest Project use XSLT for transformation from TEI XML to IIIF JSON. The same concepts can be implemented in other languages such as Python or platforms such as TextGrid and the University of Pisa’s Edition Visualization Technology.
Marjorie Burghart, “TEI Critical Apparatus Toolbox.” Huma-Num, 2016. http://teicat.huma-num.fr/.
Marjorie Burghart, “The TEI Critical Apparatus Toolbox: Empowering Textual Scholars through Display, Control, and Comparison Features,” Journal of the Text Encoding Initiative (2016). http://journals.openedition.org/jtei/1520.
Marjorie Burghart, ed., Creating a Digital Scholarly Edition with the Text Encoding Initiative (DEMM, 2017). https://ec.europa.eu/programmes/erasmus-plus/project-result-content/cfb03c8e-6765-477e-aa4d-e2b162f57068/IO2_Creating%20a%20digital%20edition%20with%20the%20TEI.pdf.
Matthew James Driscoll and Elena Pierazzo, eds., Digital Scholarly Editing: Theories and Practices (Open Book Publishers, 2016). http://www.openbookpublishers.com/product/483.
Bastien Dumont, “Review of ‘TEI Critical Apparatus Toolbox: Web-based Tools for Ongoing XML-TEI Editions’,” RIDE – A Review Journal for Digital Editions and Resources 15 (2022). https://ride.i-d-e.de/issues/issue-15/teicat.
Anne Ferger and Sebastian Rahtz, “TEIGarage.” GitHub, 2024. https://github.com/TEIC/TEIGarage/.
Neil Fraistat et al., “The Shelley-Godwin Archive,” n.d.. http://shelleygodwinarchive.org/.
Todd R. Hanneken, “The Jubilees Palimpsest Project.” St. Mary’s University, 2024. https://palimpsest.stmarytx.edu.
Todd R. Hanneken, “TEI 2 Edition.” GitHub, 2024. https://github.com/thanneken/tei2edition/.
Altmut Hintze, “From TEI to JSON within IIIF.” SOAS University of London, 2024. https://www.soas.ac.uk/research/tei-json-within-iiif.
Nicholas Laiacona et al., “Connecting TEI and IIIF.” 2018 IIIF Conference. Washington, DC, 2018. https://iiif.io/event/2018/washington/program/paper-61/.
Pietro Maria Liuzzo, “Serving IIIF and DTS APIs Specifications from TEI Data via XQuery with Support from a SPARQL Endpoint.” Balisage: The Markup Conference 2021. Washington, DC, 2021. http://www.balisage.net/Proceedings/vol26/html/Liuzzo01/BalisageVol26-Liuzzo01.html.
Wolfgang Meier, “TEI Publisher 9.0.0.” E-Editiones, 2024. https://www.e-editiones.org/posts/tei-publisher-9/.
Paolo Monella, “Linking Text and Image -TEI XML and IIIF.” Venice Center for Digital and Public Humanities, 2020. https://vedph.github.io/summercamp/materials/monella/.
Joshua Phillips, “TEI-IIIF Converter.” GitHub, 2023. https://github.com/JoshuaAPhillips/tei-iiif.
Elena Pierazzo, “What Future for Digital Scholarly Editions? From Haute Couture to Prêt-à-Porter,” International Journal of Digital Humanities 1 (2019): 209-220. http://link.springer.com/10.1007/s42803-019-00019-3.
Project Mirador Contributors, “Project Mirador.” GitHub, 2024. https://github.com/projectmirador/mirador.
Sebastian Rahtz and Contributors, “TEI Consortium XSL Stylesheets.” GitHub, 2024. https://github.com/TEIC/Stylesheets.
Roberto Rosselli Del Turco, “Edition Visualization Technology.” Università de Pisa, 2013. http://evt.labcd.unipi.it/.
TEI Consortium, ed., TEI P5: Guidelines for Electronic Text Encoding and Interchange (TEI Consortium, 2024). https://www.tei-c.org/P5/.
TextGrid, “IIIF Implementation.” TextGrid Lab, 2024. https://textgridlab.org/doc/services/submodules/tg-iiif-metadata/docs_tgrep/.
Raffaele Viglianti, “Keep 'em Separated: Integrating TEI and IIIF Without Loss.” TEI 2017. Victoria, British Columbia, 2017. https://hcmc.uvic.ca/tei2017/abstracts/t_125_viglianti_teiandiiif.html.
Jeffrey C. Witt, “Annotations, AnnotationLists, and TEI Encoded Texts.” IIIF Workshop, n.d.. https://jeffreycwitt.com/IIIFWorkshop/docs/doc4.
Jeffrey C. Witt, “IIIF and Digital Editions.” JeffreyCWitt.com, 2022. https://jeffreycwitt.com/IIIFWorkshop/docs/doc4.
Todd R. Hanneken, “Streamlining Text Encoding Initiative (TEI), International Image Interoperability Framework (IIIF), and Mirador: Theoretical and Practical Considerations for Interactive Critical Editions.” Jubilees Palimpsest Project. San Antonio, Texas: St. Mary’s University, 2024.
All text licensed CC BY-NC-SA. Linked images are not licensed unless specified otherwise.