Resources for Student Researchers at StMU 2017
Contents
Information Session, September 8, 2017
Resources
First Training Session, September 29, 2017
Understand a page of Ceriani's 1861 Critical Edition
- Antonio Maria Ceriani transcribed Latin Moses (Jubilees and the Testament of Moses) based on what he (thought he) could read in 1861.
- We want a proper TEI version of his edition to use as a base for creating a new edition based on advanced images.
- The 96 pages of Latin Moses appear on 48 pages of Ceriani's edition (two manuscript pages per edition page).
- Two edition pages have been started by TRH as examples. First understand one of the examples.
- Bring up the Ceriani's critical edition of the first page of Latin Moses
- http://jubilees.stmarytx.edu/iiifp/
- Select "Latin Moses" from the pull down and click "List resources in manifest"
- The first page was page 83 in the original manuscript, page 62 in the rebound palimpsest, and page 15 in Ceriani's edition. Click "Navigate"
- Click "Image Cube" and explore the options until you find the image labeled "Ceriani 1861"
- Bring up the TEI encoding of this page (in progress)
- https://github.com/thanneken
- This link will take you directly to the "development" branch.
- Click on "Ceriani_1861_015.tei.xml"
- TEI XML can eventually be rendered as HTML
- Compare the conventions used by Ceriani and the TEI equivalent.
Work on a page in need of TEI encoding
- Login to github (using the stmusr account, or your own, or create your own)
- Go back to the annotation development branch,
https://github.com/thanneken
- Ask for and click on an XML file that no one else is working on
- Copy the content of the xml file to the clipboard (I suggest clicking "Raw" then Ctrl-A then Ctrl-C)
- Navigate to the Validator on TEIbyExample.com (LINK)
and paste from the clipboard to the top pane (Ctrl-V).
- There are many text editors with various advantages.
Use any you like, but try to pass a validation test before committing (saving) back to github.
- A few options: Github has its own edit function.
Oxygen is powerful but costs $100. Eclipse is free, powerful, and complicated. Notepad++ for Windows is free and fairly simple.
- The bottom pane will either be green for valid TEI XML (but otherwise unimportant) or have a red error message, which may or may not be helpful.
- Take out a
</ab>
and it gives a helpful error.
- Tout out a
<ab>
and it gives an unhelpful error.
- Look for a code or comment to indicate the 5c or 19c page number of one of the manuscript pages on your page of Ceriani.
Use the IIIF Navigator to find the appropriate page of Ceriani's edition (as we did above for the first page).
Interpret human conventions in scholarly edition (Ceriani) into TEI XML
- Rule one is to not destroy information.
- Give yourself credit in the TEI header in the
<titleStmt>
after the </title>
as follows:
<respStmt><resp>TEI Encoding</resp><name>Firstname Lastname</name></respStmt>
- Add
<milestone/>
tags similar to the ones on the example page:
<milestone type="5c" unit="page" n="83"/>
<milestone type="19c" unit="page" n="62" xml:id="palimpsest_page_062"/>
<milestone unit="column" n="a"/>
(for manuscript columns, not Ceriani's columns)
<milestone unit="ch" n="13"/>
<milestone unit="vs" n="10"/>
- Add
<p></p>
paragraph code tags surrounding blocks that begin with a hanging indent.
- The top of the page is likely to be a continuation of a paragraph and therefore without a hanging indent.
Indicate that this is the final part of a paragraph with an attribute as follows:
<p part="F">
- Add
<lb/>
line beginning tags at the beginning of a new line that is not also the beginning of a paragraph.
- The attribute
type="inWord"
replaces (not supplements) the hyphen ending the previous line in human convention (so take out the hyphen after you add <lb type="inWord"/>
).
- The "don't destroy information" rule applies to information in the provided text that is not found in Ceriani.
Rather than delete, enclose the additions with
<add resp="#ocp"></add>
, meaning the enclosed was added and the responsible party is known as ocp (for Online Critical Pseudepigrapha).
- When information is corrected rather than added the original (sic) and correction are enclosed in a choice tag:
<choice><sic>e</sic><corr resp="#ocp">E</corr></choice>
.
- When in doubt, leave a comment to future editors as follows
<!-- comment -->
Commit changes in GitHub
- If you are not still logged into Github it is important to log in now.
- Go back to the Github page for the file you are working on. If the tab is not still open find it in https://github.com/thanneken
- It is important to be in the "development" branch (not master or test)
- Click the pencil icon that says "Edit this file" when you hover over it.
- Copy and paste from the validator into the edit window.
- Add a brief note such as your name and how far you got in the brief description window.
- The default option should be "Commit directly to the development branch."
- Do a quick common sense check and click "Commit changes."
Human resources paperwork
- Dr. Hanneken has Personnel Action Forms that require your name (as it appears on official University records) and Student ID Number.
Complete and return to him for routing through administrative signatures.
- If you have never been paid through StMU payroll before (in any context) there is more paperwork for you in the Human Resources office on the bottom floor of Saint Louis Hall.
- Hours (including the information session, a form of training) are to be entered into Gateway.
I honestly do not know how this works so Human Resources or your friends will be a better source of information.
Resources
Terms
- TEI
- Text Encoding Initiative
- XML
- eXtensible Markup Language, cf. HyperText Markup Language
- Tag
- A markup enclosed in angle brackets that describes content.
- Element
- The first word in a tag that defines what kind of tag it is.
It can have attributes within the angle brackets.
It can contain other elements or character strings, in which case it terminates with a forward-slash tag.
If not, the tag ends with a forward slash.
- Attribute
- Additional information about the element.
It is typically followed by an equal sign and a string of characters in quotation marks.
October 13, 2017
Review fundamental principles of TEI hierarchical and non-hierarchical tags
- Catholic theology loves mysteries (God the Father is no older than God the Son; Christ is fully God and fully human). XML hates mysteries. The container always contains the contents appropriate to the container, and the contents are always in the right container.
- In particular, every TEI XML file must contain a teiHeader and a text. A text almost always includes a body. A body almost always includes paragraphs (or equivalents, such as abstract blocks). Paragraphs always include characters (immediately or mediated by additional containers).
- The hierarchy of containers works the other way too. Characters can't be floating around in a body. They must be contained within a paragraph.
- When we want to say something about the text that does not fit within the hierarchy of containers and contents we use a self-closing tag, meaning it has the slash at the end of the tag, rather than a separate closing tag.
- Examples of container tags:
<body>
<p>
Simple text characters
<choice>
<sic>!</sic>
<corr resp="#trh">?</corr>
</choice>
</p>
</body>
- Examples of self-closing tags:
<pb type="1861" n="15" xml:id="Ceriani_1861_015"/>
<cb type="Ceriani" n="1"/>
<milestone type="19c" unit="page" n="62" xml:id="palimpsest_page_062"/>
- Note that TEI favors textual abstraction over physical manifestations of texts. We could say that characters are contained on a line, lines are contained in a column, columns are contained in a page, pages are contained in a volume, but for most literature those things are considered arbitrary.
Use HTML transforms to check earlier work
- The HTML transformations are NOT instant.
They are run in batches by pulling the latest XML from the development branch of the GitHub repository, creating the HTML, and uploading the updated HTML back to the repository.
- Changes made directly to the HTML will be lost. Edit the XML only.
- A problem with the HTML could indicate a problem with the XML, or it could indicate a problem with the engine that creates the HTML.
- To view the rendered HTML click this link HTML.
Encode more conventions used in Ceriani and OCP
- Ceriani uses italics to mean the letter is uncertain =
<unclear></unclear>
- Ceriani uses dots to mean the letter is illegible.
For example three dots means Ceriani estimates three letters are present in the manuscript that he can't read.
It should appear in TEI XML as
<gap reason="illegible" quantity="3" unit="character"/>
- OCP uses asterisks to indicate an emendation (meaning the text as we have it is believed to be wrong and the text intended by the original author can be reconstructed).
For example, post-Ceriani editors believe the original author wrote "adam" where the text as we have it says "abraham".
In the OCP text it appears as "a*d*am", where Ceriani (and the manuscript) has "abraham".
In TEI XML it should appear as <choice><sic>abraham</sic><corr resp="#ocp">adam</corr></choice>
.
- OCP uses parentheses to indicate when something in the text probably should not be there.
For example, post-Ceriani editors believe the original author wrote "liberaret" where the text as we have it says "eliberaret".
In the OCP text it appears as "(e)liberaret", where Ceriani (and the manuscript) has "eliberaret".
In TEI XML it should appears as <choice><sic>eliberaret</sic><corr resp="#ocp">liberaret</corr></choice>
.
Thoughts on not drowning in user interfaces
- Keyboard shortcuts for editing: ctrl-a, ctrl-c, ctrl-x, ctrl-v, ctrl-z
- Keyboard shortcuts for window and tab management: alt-tab, ctrl-tab, shift to reverse, win-arrow keys
- Find a bigger or additional screen
- Use dedicated installed software rather than web-based equivalents (Eclipse or Notepad++ rather than the web validator)
October 27, 2017
Background on IIIF Navigator
- The International Image Interoperability Framework (IIIF) defines the Presentation API (Application Programming Interface),
which is a machine-readable standard for specifying the images which describe a page (canvas),
and how the pages relate to each other in the sequence of a book.
- This specification is also human readable, if you want to take a look
(LINK),
but we won't get into writing or editing manifests.
It uses JSON (JavaScript Object Notation), which I find less conducive to human reading than XML.
- Most humans need a user interface to navigate the information described in the manifest.
Because the manifest is standardized, many user interfaces can access the same information.
Among the most popular are Mirador and Universal Viewer,
but these presume relatively simple books, whereas our data has many images per page.
- The Jubilees Palimpsest Project developed its own viewer, called IIIF Navigator
(LINK).
- It is oriented toward studying a page closely, rather than moving quickly through pages.
- It relies on the browser and operating system to manage tabs and windows.
This provides flexibility for organizing information on the screen.
Study manuscripts in IIIF Navigator
- We've already seen IIIF Navigator, but let's start from the beginning at
http://jubilees.stmarytx.edu/iiifp/
- The left drop-down menu lists IIIF Presentation manifests provided by the Jubilees Palimpsest Project.
We are interested in Latin Moses (which includes Latin Jubilees and the Testament of Moses).
- Selecting Latin Moses from the drop-down menu should automatically fill in the manifest address box with
/iiifp/LatinMoses/manifest.json
.
Open standards allow any compliant manifest to be entered into the manifest address box.
- Select "all resources" from the radio-button options, then click "List resources in manifest."
- For each page there are three categories of information.
The image cube can be thought of as a stack of ordinary 2D images.
WebRTI is an interactive interface useful for showing texture.
Transcriptions and translations are text-based.
- The image cube includes at least three color processes.
- Accurate color is the natural look (but likely more accurate than any photograph you have seen before).
- Extended Spectrum is easy to generate but not likely to be useful in most situations given our other options.
It simulates what human vision might be if we could see ultraviolet and infrared.
- For all of Latin Moses we have images processed with the KTK Ruby technique.
KTK is Dr. Keith Knox, an imaging scientist recently retired from the U.S. Air Force Research Lab.
Ruby is shorthand for Royal blue fluorescence divided BY transmissive.
- Some pages have additional processing labeled RLE (Dr. Roger Easton, an imaging scientist at Rochester Institute of Technology) or PCA Pseudocolor.
- Processed multispectral images excel at making different materials that look similar to the eye look very different in the image.
For example, alternate between Easton Custom 2 Diffuse and Accurate Color Diffuse in this image cube (LINK).
The "xxii" has a different spectral signature, indicating it was added later with a slightly different recipe of red ink.
- The image cube also contains different light orientations.
- Diffuse shows no shadows or texture. It is generally most likely to be helpful for reading text.
- Raking light comes from a low angle on one side (NE, SE, SW, and NW refer to the compass directions).
It is useful for identifying surface texture, such as indentations, accretions, and warp.
- Transmissive light is light transmitted through the parchment from behind, like a backlight or holding the parchment up to a window.
It is useful for identifying thin spots, such as where ink corroded the surface of the parchment.
- The image cube also includes the scan of Ceriani's edition, which you have been using,
and the microfilm, which is useless for reading but can be useful for comparing the condition of the parchment at the time the microfilm was made to its condition when imaged in 2017.
- The WebRTI images have the advantage of showing texture interactively.
The key feature is to click the light bulb to use the mouse to control the direction the light is coming from.
- The existing transcriptions and translations are not yet TEI compliant or properly formatted to reflect the line breaks in the manuscript.
Use and contribute to paleography charts
- Paleography is the study of ancient writing. You could think of it as handwriting or like fonts.
The script we are studying is Latin, but the handwriting may be confusing or even misleading.
Sometimes if only part of a letter is legible we can narrow down the letters it could be to letters that have that part.
- A paleography chart for Latin Moses has already been started at
(LINK).
It is not complete, and even where there are a few letters we would like more examples.
- It is nice when we can find an example of a letter than is not obstructed by overtext and is visible in Accurate Color.
This is most often the case at the top line of text when the overtext is upside down relative to the undertext.
- It is usually easiest to find a letter you want to use as an exemplar in the Image Cube.
Once you do, go back to the tab that lists manifest resources (with the all resources button active).
- Find the image you want to use. Make sure you have the right page and diffuse illumination, preferably Accurate Color.
The "Crop" link to the right of the image will open a new tab.
- The highlight box will always be centered, but you can change the size and proportions.
You can zoom and pan the page underneath the highlight box so that it shows just the letter you want.
- The URL at the top of the screen will change as you move the box or page.
See below to understand how this IIIF Image API URL works.
Clicking this link will open a new tab showing just the letter you selected as an image by itself.
- To add your example to the paleography chart you have two options.
If you're not comfortable editing HTML you could just email the link (not the image) to Dr. Hanneken.
If you are comfortable editing HTML you could find LatinMosesPaleography.html in the development branch of the annotations repository on GitHub.
It should be relatively simple to edit the HTML by following the example of what is already there.
Understanding the IIIF Image API
- Besides the Presentation API above, IIIF also defines an Image API, which allows us to store an image once and then access it however we want.
- Most often we want only part of an image, or not the full resolution, so we only download the part we need from the server.
- Experiment with one of our manuscript images: http://jubilees.stmarytx.edu/iiif/Ambrosiana_C73inf_052_Ac_00.jp2/full/pct:10/0/default.jpg
jubilees.stmarytx.edu
is the server that hosts the image repository
iiif
indicates we are requesting a IIIF image
Ambrosiana
(for Biblioteca Ambrosiana) is the owner of the object imaged.
C73inf
is the shelfmark of the manuscript, underscores separate fields
052
is the page number (following the explicit 19th century numbering).
Try changing it to any number between 001 and 144, keeping leading zeros to make it a three digits.
Ac
means Accurate Color rendering.
Try changing it to Xs
for Extended Spectrum or KTK01
for Keith Knox's Ruby processing.
00
is the diffuse light position.
Try changing it to Tx
for transmissive or one of 02 07 51 55
for raking angles
.jp2
indicates that the file is stored in the repository in JPEG 2000 format.
full
(the first field between slashes) is the region, in this case the full image.
The region can also be entered as x,y,w,h coordinates, meaning a rectangle that starts x pixels to the right of the upper left, y pixels down from the upper left, and extends for a width of w and height of h.
The region can also be entered as a percentage of the full image.
Try pct:50,50,50,50
for just the lower right, or pct:0,0,50,50
for the upper left.
The same could be expressed as 3066,4088,3066,4088
and 0,0,3066,4088
since the full dimmensions are 6132x8176
pct:10
(the second field between slashes) is the size of the image you want delivered.
In this case it is ten percent of the full resolution.
You could increase it to pct:100
, which is the same as full
.
(If the image is larger than the browser window your browser may squeeze it to fit.)
You could also enter the desired width as w,h.
If you enter both it will distort the image as necessary to fulfill your request.
You can leave one or the other blank to signify whatever number of pixels is the appropriate scale for the other.
For example, if you enter 300,
(with nothing between the comma and the slash) it will give you an image 300 pixels wide, and however many high.
0
is the rotation.
Try entering 180
to rotate upside down, which could be useful for reading the overtext.
default
is the default color quality for the image server, not the spectral processing.
It is not something we use, but you could try entering gray
for a grayscale image of the same.
.jpg
outputs an image in jpeg format, which is the only format supported by our server.
It would be nice if we could put in png
, but we can't as of now.
- See the IIIF website for more specificity (LINK)
November 10, 2017
This workshop was dedicated to collaborative work on encoding Ceriani in TEI XML.
The notes prepared for the possibility of encoding Gryson's edition of the Arian Commentary on Luke were not used but are preserved below for possible future reference.
Interpret human conventions in scholarly edition (Gryson) into TEI XML
- Angle brackets (<>) indicate something added by the modern editor to aid the reader that does not appear in the manuscript itself.
For example the title <EXPOSITO EVENGELII SECUNDUM LUCAM> is added by Gryson.
The TEI XML would be
<head><add resp="#gryson">EXPOSITO EVANGELII SECUNDUM LUCAM</add></head>
- Arabic numerals with a comma and period at the beginning of a paragraph indicate the chapter and verse in the Gospel of Luke being commented upon.
Since Gryson uses paragraphs consistently for a single verse of commentary we can use the chapter:verse as number attribute in a paragraph tag.
For example,
<p n="1:1">
- Desunt means "they are lost." For example, where Gryson has "(Desunt 4 paginae)" the TEI XML would be
<gap reason="lost" quantity="4" unit="page"/>
- Two vertical lines (||) indicates the start of a new page in the manuscript.
The page numbers are given in the margin.
Gryson indicates the 19c page with parentheses, p. and an Arabic numeral.
For example, || (p. 20) would appear in TEI XML as
<milestone type="19c" unit="page" n="20"/>
Gryson indicates the 6c page (note this copy of the Latin Commentary on Luke is from the sixth century) with folio number and recto or verso.
The TEI code is <milestone type="6c" unit="page" n="3r"/>
Let's use the numbering Gryson uses.
For your own information, you can think of the page number as the folio number multiplied by two, minus one if it is recto.
For example, 3v is the same as page 6 and 3r is the same as page 5.
- A single vertical line indicates a line break. A superscript number immediately to the right of the vertical line indicates a line number. Spaces are the only indicator of whether it occurs in the middle of the word.
For example, glo|5riam in Gryson would be rendered in TEI XML as glo<lb type="inWord" n="5"/>riam
- Parentheses most often indicate expansions of abbreviations.
For example "Cr(ist)i" would appear in TEI XML as
<expan><abbr>Cri</abbr><ex>Cristi</ex></expan>
.
- All caps indicates a quotation from the verse in Luke being commented upon.
Italics indicate a quotation from another source, specified in the footnotes by the line number in which the quotation starts.
For example,
<quote>De plenitudine eius nos omnes accepimus gratiam ex gratia <ptr cRef="John 1:16"/></quote>
(pointer to cross reference)
<quote>Sicut tradiderunt nobis <ptr cRef="Luke 1:2"/></quote>
- Ignore the other footnotes for now
Additional scholarly conventions
- spaces
- punctuation
- expansion of abbreviations