“Early Judaism and Modern Technology”

Todd R. Hanneken, St. Mary’s University

for Early Judaism and Its Modern Interpreters, Matthias Henze and Rodney A. Werline, eds. Atlanta: Society of Biblical Literature, forthcoming.

Version History:

  1. October 22, 2018: first draft (LINK)
  2. December 7, 2018: second draft, this file
  3. December 31, 2018: deadline for substantial changes by author (LINK)

The most dramatic development in the work of Early Judaism research over recent decades has been the expansion of digital technology. Computer-aided discovery went from a small niche, using punch cards in the 1960s, to nearly universal. Tasks that were possible with paper, pen, and typewriter became increasingly quick and easy. Tasks that required processing of large data sets beyond human comprehension became possible. By “digital” we mean information is stored, transmitted, and processed as a series of numbers, ultimately ones and zeros in binary code. Some of the advantages of digital technology mirror the changes in scholarship with the advent of the printing press and affordable paper. Like the printing press (and more so), digital technology can create exact duplicates of information. Unlike analog duplicates, each digital copy is identical to the original, no matter how many copies are made. Like paper, digital information can be stored and transmitted at relatively low cost. Optical media, such as CD-ROM and DVD-ROM, rose above magnetic media for their low cost and were in turn replaced by magnetic and electronic media with higher capacity. More importantly, the transmission of digital information became quick, easy, and relatively affordable with the spread of standards known collectively as the Internet.

Rudimentary uses of digital technology in Early Judaism research can be thought of as quicker, easier, and cheaper versions of pre-digital technologies, such as paper. One trend in recent decades has been increased utilization of the nature of digital information not only for storage and transmission, but processing. Once information is “machine readable” it becomes more than a conduit of “human readable” information. The machine can find and transform information in ways that would be impossible or extremely time consuming otherwise. Digitization, or making information machine readable, occurs at many levels of abstraction. A page of a book can be digitized at the basic level of an image of the page, with black and white dots representing ink and paper. That information can be stored, transmitted, and presented to another human that may understand it, but the machines themselves have no greater understanding of the content than did the paper. The next level of abstraction is to digitize the text on the page, not just as black and white dots, but encoded as characters in an alphabet. This encoding can be done by human data entry, or through a form of machine learning called Optical Character Recognition (OCR). (The encoding of non-Latin alphabetic characters is another development discussed below.) At this level of machine understanding the text can be searched for text strings, although inexact matches or matches that span lines of text require an additional level of machine understanding. Higher levels of abstraction, easy for an informed human reader, require additional human encoding or machine learning. Humans easily distinguish whether italics indicate a title of a book or journal, a word in a foreign language, or emphasis. We distinguish a series of capital letters as an acronym or a roman numeral, and easily equate different standards for citation. Other levels of data about the data on the page (metadata) might include language and catalog information of the work in which the page is found. Recent decades have seen significant advances in digital technology moving from a “dumb” to “smart” medium through metadata standards, human encoding, and machine learning. Nevertheless, awareness of the challenges and levels of abstraction of machine learning can help the researcher troubleshoot problems. For example, a search for “Is 40:5” may not find a reference to “Isa XL.5.” A search for a word with an “m” may fail if the optical character recognition read “rn” (and failed to detect the language from context, and that the word with “m” is a dictionary word in that language). Machine understanding of information in context is a trend in artificial intelligence applied to Early Judaism research, but cannot yet be taken for granted.

Another general trend in digital technology in Early Judaism research has been progress from proprietary and closed tools to open and interoperable standards. The term “silo” is applied to a software application or website that may be very powerful within itself, but unable to share or receive information from outside sources. In decades past even the simple ability to copy and paste text from a Bible program to a word processor could not be taken for granted. In general this kind of problem occurs when there is no standard for encoding and transmitting information, or the standard is not followed. Many application developers find it easier to reach short-term goals by inventing their own system, rather than adopting a system understood by other applications. The advantages of interoperable standards apply to many levels, including image repositories, textual analysis, and bibliographic data. A simple example can be seen in the development of encoding Hebrew, ultimately leading to Unicode. Hebrew posed challenges mainly in that the alphabet is non-Latin and the direction is right-to-left, with more problems arising with Masoretic pointing. Early systems relied on some degree of transliteration, but were neither standardized nor machine readable. The system most designed for machine processing was Beta Code, which would render אחר as “)XR”. Systems designed to look like Aramaic script in word processing programs were not standardized and relied on tricks with fonts. A font could be designed such that a character “)” or “a” could look like א, but the computer system had no understanding that the language and script were other than English. The user had to type backwards, manually manage line breaks, and tell the spell checker to ignore rHa for אחר. A better solution, though rarely used for Hebrew outside of Israel, was to use an alternative character set. An 8-bit character set can encode 256 distinct characters. Some of those could be assigned to Hebrew letters, but support for additional character sets was limited. The ultimate solution was the development of the Unicode standard, which uses up to sixteen bits per character and has the ability to encode 65,536 characters without tricking an “a” to look like an aleph or alpha. Researchers today are unlikely to encounter problems with character sets unless working with digital materials from before the turn of the century (in which case further reading about ASCII, ANSI, Unicode, UTF-8, ISO-8859, and Windows-1252 might be helpful). Unicode also allows signals for text direction, i.e. switching between right-to-left (RTL) and left-to-right (LTR). In this case the existence of a standard and general compliance does not guarantee that there will not be problems across different implementations. Problems with multi-line right-to-left text in otherwise left-to-right paragraphs in Microsoft Word for Macintosh persisted long after standards existed to solve that problem. Other standards deal with much more complicated problems. When successful, standards for interoperability make it possible to aggregate, search, process, and visualize data from many sources. Again, progress over recent decades is remarkable, but when troubleshooting or identifying limitations in research methods it is often helpful to understand the underlying standards for interoperability.

Specific tools for Early Judaism research are discussed below in the categories of (1) primary sources search and access, (2) secondary sources search and access, (3) images of manuscripts and artifacts, (4) data visualization, and (5) publication and dissemination.

Primary sources search and access

Digital collections of primary sources are widely available and typically divided by language and corpora. Resources are further divisible into those that are freely available and those that require purchase or subscription. With some notable exceptions of projects funded by universities and grants, resources freely available on the Internet often use editions and translations that are in the public domain and out of date (e.g., EarlyJewishWritings.com). Software packages and subscription services can be expensive for individuals, especially those working in multiple corpora. Research universities typically provide access to visitors physically on campus.

Digital resources are most bountiful for the biblical canon, particularly the Protestant canon. These platforms have been expanded to include additional corpora, including Pseudepigrapha, Philo, Josephus, and the ability to create “custom” versions. Web-based resources such as BibleGateway.com (free, ad supported) offer many translations and simple searching. Locally-installed software such as Logos and Accordance (and BibleWorks until it closed in 2018) offers substantially more power, including search by morphology and instant access to parsing and lexicons. Additional resources are often included or available as upgrade packages (e.g., maps, commentaries, and dictionaries).

For Greco-Roman materials, the Perseus Digital Library at Tufts University is an early star of digital humanities projects, having originated in 1985. Texts in Greek and Latin are linked to morphological information, and forms can be entered to show possible and likely parsings and lexicon entries. A related project, Perseids, uses open standards to build editions of ancient documents. Alpheios provides tools for philological analysis. Pelagios extends the principles of Linked Open Data with a focus on geography in the ancient world. These projects originated with a focus on Greek and Latin, and expanded to the classical Mediterranean world. Because they utilize open standards, inclusion of Hebrew and Aramaic materials is easily imaginable. Another free, web-based resource is the Online Critical Pseudepigrapha. Among resources that require a subscription for full access, the Thesaurus Linguae Graecae (TLG) at the University of California Irvine is oldest (1972) and most comprehensive. An abridged collection and lexica are available with free registration. The Loeb Classical Library at Harvard University is also available with subscription in a searchable digital format. Other databases specialize in specific media, such as papyri and inscriptions from the ancient world, not necessarily related to Early Judaism. Papyri.info at Duke University exemplifies use of open standards in aggregating information from and about papyri. The Packard Humanities Institute’s database of ancient Greek inscriptions covers direct written evidence, as opposed to literary texts copied in manuscripts.

Electronic resources for the Dead Sea Scrolls are available as optional additions to some Bible software packages described above. The most powerful dedicated tool is the Dead Sea Scrolls Electronic Library (DSSEL) published by Brill and Brigham Young University. The transcription and English translations are fully searchable and linked to Palestine Antiquities Museum (PAM) images, though not necessarily the best available images (for which see Images of Manuscripts below). The DSSEL was published as a specialized application on CD-ROM in 1999 (biblical) and 2006 (non-biblical), and converted to BrillOnline Reference Works in 2015 and 2016, respectively. This resource is available only with subscription, and is not interoperable with open standards.

The oldest and most comprehensive digital collection of Rabbinic Literature is the Responsa Project at Bar-Ilan University. The project traces its origins to the 1960s, and released its first version in 1992. After versions on CD-ROM and USB drive, the project is now available by subscription in a web browser. The project supports browse and search, but lacks interoperability and other advanced features. The Soncino Classics CD-ROM includes Hebrew/Aramaic and English translations of the Babylonian Talmud, Midrash Rabbah, and Zohar. The translation of the Talmuds edited by Jacob Neusner is available as a stand-alone ebook and addition to Logos bible software.

The Comprehensive Aramaic Lexicon at the Hebrew Union College Jewish Institute of Religion includes three million words from the history of the Aramaic language, with morphological parsing and lexical entries. In addition to search and browse, the interface supports “key word in context,” which shows a word with a few words before and after from every instance in the database. The Digital Syriac Corpus provides a massive repository of literature compliant with interoperable standards for accessible linked data. The Corpus together with Syriaca.org at Vanderbilt University, and compatible tools such as Pelagios and Pleiades, place Syriac studies ahead of the pack of fields supported by digital humanities resources. Similarly, Papyri.info (above) and Coptic Scriptorium deserve mention as exemplars of the potential of open standards and digital tools.

Secondary sources search and access

Secondary literature has several characteristics that make it easier to aggregate and discover than ancient sources. Publications in recent decades are typically “born digital,” meaning they were created on computers in the first place so do not require digitization such as scanning and character recognition. (Errors still occur when a digital source is printed to paper and redigitized.) Modern publications have objective characteristics such as “author” and “date,” unlike ancient sources which may require several paragraphs to describe the likely range of possibilities. Data about data, or metadata, can be entered, aggregated, indexed, and searched far more easily when the metadata is simple and machine readable. Standards for recording bibliographic data certainly exist, yet different interpretations can still cause a search to fail, or the same work to appear twice in a search. This is especially the case for translations, multi-volume works, and works in a series within a series. For example, the series Discoveries in the Judaean Desert follows a sequence for all volumes in the series, but additional internal numbering adds confusion. The volume scholars call “DJD 13” also includes a cave number (4), the volume number for that cave (8), and a part number (1), in addition to the overall series volume (13), with roman numerals to add to the fun (Qumran Cave 4.VIII: Parabiblical Texts, Part 1 [DJD XIII ; Oxford: Clarendon, 1994]). The combination is confusing enough for beginning scholars in Dead Sea Scrolls research. Machine learning and librarians attempting to fit the reference to an interoperable standard are likely to arrive at different interpretations of the standard or simply make mistakes. To the extent to which modern scholarship falls neatly into the categories anticipated by metadata standards, which is a large extent overall, it is easy for aggregators to collect bibliographic information and make it easily searchable. The largest aggregator of catalog metadata is Worldcat, which ingests catalog information from libraries all over the world. Errors made by any one of those libraries will be perpetuated in Worldcat, but it remains an excellent resource for discovery. A work is more likely to be duplicated than missing in Worldcat.

Searching for secondary literature becomes more complicated when searching for information not included in the standard library catalog metadata. Unlike catalog data, the contents of a work are typically restricted by copyright. Google Books addresses this problem by indexing all of the content of a book even if it cannot show that content. Thus searching Google Books might indicate if the content of a work matches search terms. Large scale, free resources rely on simple machine learning, which may work well for specific terms but fail to distinguish a search about the Book of Job from a search for a job (employment). Many researchers prefer more focused and/or subscription-based databases that rely more on informed human interpretation. Among free bibliographic search tools related to Early Judaism, the most complete is Rambi, The Index of Articles on Jewish Studies from the National Library of Israel. More focused (but not too narrowly) on Dead Sea Scrolls research is the bibliography maintained by The Orion Center for the Study of the Dead Sea Scrolls and Associated Literature. For the proper amount of money, more often paid by libraries than individuals, subscription services maintain a more curated index, and sometimes the complete work as PDF or eBook. EBSCO Research Databases categorize scholarship into many categories, including the EBSCO Jewish Studies Source. The American Theological Library Association also maintains a Religion Database. Many libraries subscribe to several databases and make efforts to unify search and results, such that users may not need to know the databases involved behind the user interface. One can expect to see further progress in aggregation of search and access, especially for works in the public domain or openly licensed. An example of the concept of an aggregator discovery tool, though more relevant to American history than Early Judaism, is the Digital Public Library of America.

Many researchers would like to search for secondary scholarship that deals with a particular primary source. This is sometimes easy if the citation appears in the title, keywords, or abstract in an expected form. An index of ancient works cited in a monograph may be searchable in Google Books, but only if the search string matches exactly with no dependence on contextual “common sense.” This situation will improve with better artificial intelligence and better tagging of metadata into machine-readable formats. If the primary source is specifically Talmudic, the Lieberman Index (subscription required) claims to index ancient and modern treatments of any given passage. Researchers may also wish to search for more recent discussion of a subject treated by an older secondary source. It is easy to find bibliography going back in time, but harder going forward. The best resource for searching newer works that cite an older source is Google Scholar. Links labeled “cited by” and “related articles” may aid discovery, though one may not assume that there are no more citations.

Researchers may also wish to know about works that have not yet, or just recently, appeared in print. Often years go by between the first presentable version of research and the final publication. As discussed below, authors have many options for making their work public other than established print publishers. Google and Google Scholar index major repositories such as Humanities Commons and Academia.edu. Researchers can also search these repositories directly or join them for notifications. Researchers may find relevant news by following the right accounts on Twitter (such as Annette Y. Reed @annetteyreed) or blogs (such as Jim Davila’s PaleoJudaica). Researchers may find that resources published on the Internet may disappear (dead links) for a variety of reasons. Google sometimes displays a recently cached version of a webpage that is currently unavailable. For older dead links, one’s best hope is the Internet Archive’s Wayback Machine. This tool allows users to go to a web address or browse the web as it appeared in the past.

Images of manuscripts and artifacts

For many researchers the most primary of primary sources is not a modern print edition, but a digital facsimile of a manuscript or other artifact. Digital technology has already brought tremendous improvements over microfilm and photographic plates in printed editions. The cost of production and transmission is lower, and quality is typically higher. As high-quality digital scanning expanded in the 1990s, and digital photography surpassed film photography in the 2000s, digital access to artifacts expanded and is continuing to expand. For some researchers, the only question is whether the object has yet been digitized and made accessible. For others, various questions determine whether the benefits of digital technology for research into ancient artifacts have already reached maturity or are just beginning to blossom.

One question is whether the information sought is easily digitized. It is easy to create a simple digital equivalent of a photograph or microfilm. Information is not so easily digitized if the markings are damaged or otherwise illegible. In the case of palimpsests (erased and overwritten manuscripts), a simple photograph may not suffice to make the erased text legible. Spectral imaging may be necessary to enhance images. For research in Early Judaism as mediated by Early Christianity, the largest project to make palimpsests legible and available online has been the Sinai Palimpsests Project (free registration required). Artifacts can also be difficult to photograph and digitize if texture is the primary or essential conveyor of meaning. Bad (diffuse) lighting may make cuneiform tablets, stone inscriptions, coins, amulets, and so forth illegible. West Semitic Research pioneered applying technology for dynamic relighting (Reflectance Transformation Imaging) to artifacts related to Early Judaism. Their InscriptiFact Digital Image Library has thousands of relightable images, with thorough catalog information for search and browse (free registration required). The Jubilees Palimpsest Project combines spectral imaging with dynamic relighting for all of Latin Moses (Latin Jubilees and the Testament of Moses), and a few other artifacts.

Another question is whether the researcher already knows the catalog information of the object sought. It is easy to find (or confirm the unavailability) of an artifact if one already knows the owner and designator (call number or shelf mark). High quality, sometimes spectrally enhanced, images of the Dead Sea Scrolls are available from the Leon Levy Dead Sea Scrolls Digital Library. Other images are available from the Israel Museum Digital Dead Sea Scrolls. The Aleppo Codex is available as its own site (Flash required). The Leningrad Codex is available from the Internet Archive. Similarly, Codex Sinaiticus and Codex Vaticanus can be viewed online. For lower profile artifacts, the researcher is at the mercy of the holding institution. Some institutions, such as the Bibliothèque nationale de France, have systematic programs for digitization and follow open standards for accessibility. In all these cases, however, images of the artifacts are only discoverable if the researcher already has the catalog information. This could be gained from critical editions, secondary scholarship, or perhaps aggregators such as Trismegistos. As artifacts are increasingly annotated with machine-readable linked data, it will become increasingly effective to search for artifacts not just by owner and shelf mark, but by scribal features (support, columns, lines, hand, provenance) and contents of the text.

Another question that will determine one’s experience of the progress already made in digital access to artifacts is what one wishes to do with the images. If one wishes only to read a text on screen, one can expect decent options for pan and zoom. If one wishes to recontextualize the image in any way, it will make a difference whether the image source complies with standards for interoperability. Many of the aforementioned sites are closed silos, and seem to wish to prevent the user from saving the image (although it is difficult to prevent a simple screen capture). Other sites favor open standards for interoperability. Exemplary in this regard is vHMML, the virtual library of the Hill Museum and Manuscript Library at St. John’s Abbey and University. The collection focuses on digital preservation of threatened collections, mostly Christian and Islamic. To the extent possible in light of intellectual property restrictions, the project favors open access, open standards, and open source software. One notable set of open standards is the International Image Interoperability Framework (IIIF). With IIIF compliance, images and collections can be reused outside their silos without divorcing them from the metadata and information provided by the original repository. Alternative viewers and collections can be easily implemented, along with sophisticated systems for annotation and collaboration. Once information and its relationship to other information becomes machine readable through defined standards, the possibilities for computer-assisted recontextualizing of information become limitless.

Data visualization

Sometimes discovery and learning benefit from rendering data in ways other than linear strings of text. Data visualization can communicate in a glance what otherwise would have required extensive work and abstract thinking. One of the core advantages of digital processing is the ability to store and process massive quantities of data. The great pre-digital scholars were able to comprehend, retain, and notice patterns in huge amounts of literary data, but even they had their limits. Visualization tools that developed in the past decades have the ability to summarize information that would have been extremely time consuming or impossible in earlier generations.

For example, “word clouds” quickly visualize the words that appear most frequently in a set of text by rendering the more frequently used terms in larger letters. This can quickly convey themes and emphases in a work. One could quickly visually the frequency of personal names that appear in a work, such as the Hebrew Bible, and compare it to the relative frequency of those names in the New Testament or Talmud. If properly coded, names could be expressed in colors for gender, ethnicity, and any other object of study. Color can be used to express any dimension in a data set using “heat maps.” Charts can express the relative frequency of a lexical variant or synonym in one corpus or period relative to others. Dendrograms can be automatically generated to visualize “trees” of manuscript families based on degree of textual similarity. The “key word in context” became more popular and easier to generate with digital texts, and shows more of the context than a lexicon or concordance normally would. One can also easily create geographic maps with pins or colors representing mentions or more detailed information about place names in a work. In the past scholars, have argued that geographic information mentioned in a work (if accurate) might indicate provenance of composition. Simple mapping software makes it easy to apply that line of inquiry to any text, compare it to other texts, and present arguments visually to reach a wider audience more quickly. In general, research questions that might have been intuited or manually tabulated with relatively small and well referenced corpora such as the biblical canon can be asked of much larger corpora as long as they are adequately machine-readable.

Publication and dissemination

Digital technology has not replaced the conference paper and printed volume, but it has added substantial new options. Email might be thought of as a quicker and easier version of pre-existing media, such as mail. Other electronic media facilitate communication globally that before could only have been imagined in physical proximity. Web logs (blogs) and then Twitter offered an easy way to share announcements and ideas, especially in their nascent stages. Academia.edu gained popularity as a resource for authors to share their ideas and reach readers (and also gained controversy in its for-profit use of personal information). Non-profit alternatives such as Humanities Commons and institutional repositories were built to have the same or improved capabilities for search, notification, and discussion without selling personal information. In addition to published material, such online forums can be used for conference papers, slideshows, syllabi, data sets, videos, etc. Audio-visual materials are more common for reaching popular audiences (e.g., the Society of Biblical Literature’s Bible Odyssey project or James McGrath’s “Religion Prof” podcast) but that could easily change.

Even with some help from the Internet Archive’s “Wayback Machine,” it is reasonable to wonder if digitally disseminated information and ideas will have the endurance of printed paper volumes or the parchment and papyri we study. The vast majority of the information we have from antiquity, we have not because it was durable but because it was copied. It was copied because it was deemed worthy of copying. To the extent that information on the Internet is deemed worthy of copying and archiving it will be preserved more easily than its predigital analogs. The copying of digital information is the easy part. Archiving also requires attention to formats. Portable Document Format (PDF) is popular as a substitute for paper, and thus is very human readable, but less so machine readable. For important works and editions, the Text Encoding Initiative provides an archival standard for texts to be readable to machines as well as humans.

The ease of copying digital information raised in a new way questions of intellectual property and copyright protection. From one perspective, copyright restrictions create a barrier to access, copying, and in that way preservation. From another perspective, copyright restrictions protect the rights of authors and publishers. Digital media have not displaced the traditional benefits of print publication for making information accessible in standard form. Besides the massive copying and dissemination implicit in the production and sale of physical books, publishers have performed functions such as vetting the quality of work. This vetting is often the best available metric in the career of a researcher, specifically for promotion and tenure. At one point there was a perceived divide separating digital access from print publication, associated with peer review and reliability. The lines have blurred substantially as publishers have found markets for online subscription- or open-access alongside or complementary to print publications. Meanwhile, open-access online-only journals not affiliated with a traditional print publisher have built strong reputations based on quality of editorial board, peer review, permanence, and preservation. The category of “open access” can be nuanced with standard licenses, such as Creative Commons licenses, which specify exactly what can and cannot be done with work published online. As with other widely-adopted standards, the Creative Commons licenses facilitate the spread of information through machine aggregators. The best online journals have plans for permanence and preservation, often by agreements with archival repositories at major universities. Online resources require maintenance and could easily disappear, especially if the provider is a for-profit service that ceases to be profitable. Print publications are implicitly archived by libraries that hold them even if the publisher goes out of business. Today a library may provide access to an external digital subscription without maintaining its own copy. An institutional repository, however, implies commitment to preservation including replacement of storage hardware, following archival formats, and converting formats before they become inaccessible through obsolescence. A researcher has more options than ever for making information and ideas accessible to a large number of people in the present, preserved for the future, and vetted for quality.

Conclusion

Research in Early Judaism has changed dramatically since the 1986 publication of the first edition of Early Judaism and Its Modern Interpreters. Research that had been possible with difficulty became easy. Research that had been impossible became possible. Information that had been accessible to very few became accessible to many. Along with social trends not directly linked to digital technology, there were changes in the questions being asked. Digital technology also impacted related aspects of the life of a researcher. Not least among these is teaching, both in general and specific to Early Judaism. Digital media, course management systems, video conferencing technology, and so forth changed the list of things that could only happen in a classroom, such as showing a video, giving a lecture, or having a discussion. The role of memorization came into question for information that could be quickly accessed using digital tools. The importance of teaching students how to use digital tools left in question the necessity of teaching pre-digital tools. Especially at introductory levels, digital tools that give lexical and parsing information opened the possibility of teaching just enough of a language to use these tools. In addition to teaching, related interests such as publishing and museum and library science were impacted by the developments discussed above from the perspective of the researcher. As with previous generations in which new tools became available, the distinction between the possible and the beneficial, what one can do and what one should do, became vital.

Computer assisted research developed from a set of tools into a self-reflective discipline in its own right. “Digital Humanities,” a vague and problematic term among many in the history of research, became a buzzword that encompasses a range from doing the same kind of research with a computer, to self-reflection on the nature and role of the discipline itself. As digitally-enabled tools impacted not only research but all aspects of the life of the researcher in society, the system of intertwined benefits and hazards of digital technology became important objects of study. All researchers in Early Judaism have been impacted by at least some tools from digital technology. For some researchers, the relationship between Early Judaism and Digital Humanities became a fruitful avenue of interdisciplinary inquiry, taking its place along with other interdisciplinary approaches in the history of the discipline.

Sites cited in order of appearance