After a lazy couple of weeks with books, travel, and the holidays, I’m struggling a bit to get back into the swing of things. One step in that direction is starting to work “for realz” on my paper for the Archaeological Institute of America’s annual meeting in January. (I probably need to start writing lectures for my two new preps in the spring, but one thing at a time!).
My paper is in a panel on the re-use of archaeological data, and when I wrote my abstract, I was all pepped up on the idea of flow. This hasn’t entirely changed, but I’ve recently started to worry about how time and flow work together. In particular, I’ve become interested in the way in which the concept of “legacy data” has shaped our view of digital archaeology in practice.
It seems to me that legacy data is a pretty broad concept. In 2008, Internet Archaeology published a useful survey of projects using legacy data. In the introduction to that volume Penelope Allison noted that legacy data in Mediterranean archaeology could include anything from Pausanias to late-20th century excavations notebooks. In other words, data needn’t mean literal data – that is granular or fragmented bits of digital information gathered from survey units or trench side – but also analog sources like photographs and notebooks, narrative sources such as those produced by early travelers and archaeological publications, primary paper sources of varying kinds and granularity, as well as information produced and stored in obsolete or antiquate “legacy systems.” The common feature of all these sources of data is that the archaeologists using these sources did not produce them. They are legacy because they were passed down from one generation (however defined) to the next.
In many cases, the research questions of interest to scholars who chose to work in legacy data differ from those of the archaeologists who produced the original dataset. This discontinuity often emphasizes the need to adapt the data from the past to the goals of analysis in the present. At its most challenging, this represents a kind of methodological disconnect that speaks both to “advances in archaeological practice” to appropriate the name of the SAA’s methods journal as well as new questions of interest to the contemporary discipline. In other words, long standing projects with consistent practices, methods, and research questions over time are less likely to produce “legacy data” than projects that formally conclude, experience some form of institutional or personal discontinuity, or change how they approach their archaeological work.
Since 2010, I’ve been working with a team of archaeologists to publish some of the work done by the Princeton Cyprus Expedition at the site of ancient Arsinoe in the village of Polis Chrysochous in Western Cyprus. We have focused on three main sources of data from the project – the excavated architecture, which includes two Early Christian basilica style churches, the ceramics and other small finds, and the excavations notebooks. These notebooks represent the legacy data component of our work at Polis and document excavation at the site over 20 years starting in the mid 1980s. These notebook constitute a kind of legacy data in part because many members of the team currently studying the excavation did not participate in the original work, but mostly because the excavations were not conducted in a formally stratigraphic manner. As a result, we have had to cautiously reconstruct stratigraphic relationships on the basis of descriptions in the notebooks and a general understanding of their excavation techniques which removed material on the basis of levels and passes which may or may not have clear stratigraphic definition.
In this example, then, legacy data reflects a methodological discontinuity with contemporary archaeological practices. The progressive character of archaeological practices, then, creates the conditions in which legacy practices come into being. At the same time, we recognize that legacy data preserves evidence for past archaeological practices as well as for the deeper past embodied in the artifacts, architecture, and depositional processes that it describes. The deeper past that archaeology studies tends to be far more resilient than past practices manifest in legacy data.
Our approach to the Polis notebooks has depended in no small part on the work of Joanna Smith who worked to have the notebooks scanned and made available to us in digital form. This allowed us to study the notebooks remotely and to begin the process of reconstructing stratigraphic relationships from non-stratigraphic narratives.
The first step in this work was to create a distinct identifier for each excavation event. This identifier combined excavation year, area name, trench name, (both according to the Polis grid) level, and pass. In addition to this identification, we might have also included notebook number and page number. Many trenches had multiple notebooks and because the excavators did not proceed in a “last in, first out” method, it was possible for multiple contexts to be open at the same time. The notebooks recorded the daily work of the excavator and moved from context to context depending on their work. As a result, the excavation of certain contexts could appear on dozens of different, non-consecutive, pages in a notebook.
The individual contexts, identified by their year, area, trench, level, and pass, can then be arranged in relation to other excavated contexts, in an informal matrix (that I call a “Franco Harris Matrix” because so rarely can we establish immaculate relationships). We can then associate with these contexts our analysis of the finds and, in many cases, architecture. This has allowed us to propose a chronology for several of the buildings at the site and to construct assemblages of finds that allow us to make arguments for the economy and connectivity of the Late Roman community. Most of this work is done with relational databases that allow us to connect different kinds of data in various one-to-many relationships.
At the same time, we are aware that our approach involves disaggregating the excavation notebooks. This essentially disrupts the narrative of excavation that these notebooks preserve. On a practical level, this made it more difficult for us to understand cases when multiple contexts being open at once led to the contamination. It also tended to obscure aspects of the excavation process that developed over the course of a season. In the Polis system it was possible for a level to be excavated in a series of irregular passes over the course of weeks resulting in the latter passes through the level being informed by the excavation of other contexts in the trench. Fortunately, it appears for now that these kinds of issues had only a minimal impact on the kind of coarse analysis that we have conducted so far, but future work, particularly in more complex areas of the site, may require greater attention to the organization of the notebooks themselves as a form of information on archaeological practices.
This example demonstrates the complex ways in which our effort to join legacy data to contemporary data collection processes runs the risk of obscuring the character of legacy data as evidence for more than simply contexts, objects, and features from the past, but also as evidence for past practices. Part of the significance of the linked-data movement, for example, is that it privileges the interoperability of data. By encouraging the production of granular data, we now have datasets that support artifact level analysis across multiple sites. Open Context, for example, has helped my excavation and survey at the site of Pyla-Koutsopetria on Cyprus to create a corpus of artifact assemblages identifiable at scales ranging from the individual sherd to the type, chronology, or archaeological context. Unlike our work at Polis, however, we designed this project with this kind of data publication in mind.
The publication of data from Polis invariably requires the re-arrangement of the legacy data to allow it to contribute to the larger “flow” of data being produced by contemporary archaeological projects. This streamlining and disaggregating will result in the obscuring of past practices that contributed to the status of this data as “legacy” as we work to ensure that it integrates with contemporary expectations and research questions. In many cases, our growing dependency on digital technologies, tools, and practices are responsible for reshaping legacy data.
There is little disputing the significance of this kind of work for our field. Renewed attention to legacy datasets have allowed us to publish two decades of excavation at Polis in ways that make it useful both to understanding our site, and hopefully, in the near future, understanding the larger region. We have also be able to avoid many of the challenges associated with new field work which range from the cost in time and resources in establishing a new project to the pressing and ongoing realities of artifact storage, site preservation, and publication.
The study of legacy data also presses us to engage in the multiple of temporalities present in archaeological work. We can largely agree that archaeology has focused on material from “the past” and recognize that archaeological practices have changes and – by and large – improved since the disciplines founding in the 19th century. Archaeology remains a progressive science.
Legacy data also makes us aware of a third kind of archaeological time which I’ve tended to call “ethical time.” Ethical time in archaeology recognizes the persistent value of artifacts and data from excavations even when these no longer coincide neatly with progressive ideas of archaeological practice or clearly defined archaeological contexts or provenance. The practice of repatriation, for example, represents the ethical time in archaeology in that it embraces the potential for the return of artifacts to restore a situation that existed a historical past.
The study of legacy data offers similar opportunities for archaeologists in that we can find new ways to restore the significance of past field work. Like repatriation, which must often recognize the compromised situation of the repatriated artifact and the limits to our ability to restore the artifact to its archaeological context, the reuse of legacy data faces similar limitations. Releasing legacy data into the world of contemporary data flows often requires us to strip aways parts of its historical situation as part of an archaeological past to accommodate the information in a disciplinary present. We can hope that this compromise restores the legacy of the field work both to the meaningful world of contemporary practice and to our disciplinary efforts to understand the past.