A team of scholars and technologists at the Emory Libraries led by Rebecca Sutton Koeser and Brian Croxall are developing tools for identifying and marking up names, places, and organizations in Emory’s collection of materials associated with the poets known as the Belfast Group. Tagging these entities will make it possible to examine and present some of these writers’ social and geospatial networks. Because it connects identifiers to the semantic web, the tools created for the Networking the Belfast Group project will give users access to much more data than documents like finding aids regularly provide.
In order to compare the team’s tools with doing the same work by hand, I tagged the people, businesses, and places in the Frank Ormbsy finding aid with identifiers. These identifiers connect people to resources of linked data, providing more information about “Frank Ormsby” and establishing that we’re always talking about the same entity throughout our finding aids. That might sound easy enough with someone named “Ormsby,” but when you start trying to establish which “Robert Johnson” a particular finding aid is referencing, it becomes much more complicated. Marking up the 14,000 lines of XML took even more time than we expected—about fifty hours. The vast majority of my time was spent determining who people were in the database we are using for our unique identifying numbers, the Virtual International Authority File (VIAF).
The other issues I encountered centered on how language and networks function. The first was, bizarrely enough, a question about parts of speech. If a group is said to be Northern Irish, do I mark that phrase as a geographic place name? If the text said, this group was from Northern Ireland, I would have not even questioned marking it with the identifier for “Northern Ireland.” When it was an adjective, I hesitated and conferred with the other team members. Is this because adjectives describing groups of people have more wrapped up in them than geography? Is it just a categorization hiccup?
The second kind of question I asked was about relationships. If a poem has Belfast in the title, should this be given a geographical identifier? If a poem title has the name T.S. Eliot in it, does T.S. Eliot get marked as the person? The relationship is quite different from most of the marks, where T.S. Eliot would have authored materials in the collection or where Belfast would have been a site of production or distribution rather than a subject. When something is about, rather than by a person, should it be tagged? And whatever we decide on that question, should the same policy apply to places when they are used either as subjects or sites? Similarly, there were reviews of books where I tagged the authors of both the text being reviewed and of the review itself. The actual collection houses only the reviewer’s work; however, this seemed a different kind of “about” than a poet writing about another person.
I also ran into questions about how to differentiate members of families. When a letter is from the Longleys—a husband and wife—whose identifier do we use? When a Longley is referred to without the first name, how do I decide which Longley is meant? Sometimes it was obvious that Michael Longley was meant, because he is a poet and the collection includes more of his materials; however, I could see how making assumptions about who to identify might reduce the presence, particularly of women, in these networks.
At the center of these questions is how we understand networks. Knowing that two entities are related is quite different from knowing how two entities are related. These questions have implications not only for the functions of the tools created by the team, but also for how researchers will use and understand the networks we produce using them.