Legacy Data, Digital Heritage, and Time: A Response

My old pal Andrew Reinhard of the American Numismatic Society and a PhD candidate at the University of York kindly agreed to comment on my post from yesterday. Because he interwove his responses to my original post, I thought it would just be easier to repost yesterday’s post with his comments included. His comments are in italics.

Over the last couple of weeks, I continue to think about the role of legacy data in archaeology. This is for a paper that I’ll give next month at the annual Archaeological Institute of America. I’ve started to work through my ideas here and here.

At first, I imagined that I’d give a fairly routine paper that focused on my work with notebooks at the Princeton Cyprus Expedition at Polis on Cyprus. As readers of this blog know, a small team of us have worked to analyze over 20 years of excavations at Polis and to move the data from a largely analogue, notebook system, to a relational database that we can query. This has not only allowed us to understand the relationship between the excavation, ceramic assemblages, and architecture, but also moved us toward a secondary goal of publishing the data from Polis.

This is something similar to what the American Numismatic Society has done with the notebooks of E. T. Newell. We have the printed notebooks, they have been scanned and tagged, are available as open access to the public, and give us insight into the first Golden Age of numismatics, both the people and the artifacts and related context. The trick is doing a third step: publishing what we’ve learned. We can do all of this cool, useful stuff post-digitization, but the results must be published both digitally and in print. More about the necessity of the analogue below.

At the same time that I was working on this stuff, I was also continuing to think about time and archaeology and reading some recent works including Marek Tamm and Laurent Olivier’s Rethinking Historical Time: New Approaches to Presentism (2019) as well as some turn of the century works like F. Hartog’s Regimes of Historicity: Presentism and Experiences of Time (2015). These works consider the changing nature of time and heritage in archaeology and argue that emergence of the contemporary “heritage industry” particularly after 1989 (and September 2001) demonstrates a changing notion of time in which heritage largely serves the various needs of the present. This contrasts to an earlier regime of time which emphasized the past as evidence for progress into the future.

I’m coming at the issue of heritage-time from looking at active software use. Classical civilizations found the archaeologist thousands of years removed from acts of creation, use, modification, discard, and destruction. Digital archaeologists find themselves watching as digital landscapes, sites, and artifacts undergo those stages over the course of an hour, often less. The notion of digital time is so absurd that we must observe it at the quantum level, understanding that Deep Time happens concurrently with things occurring literally at the speed of light. How are archaeologists equipped to handle an archaeology of the immediate? This issue is not limited to digital data, but also to anything that is mass-produced. The speed of waste now when compared to what it was 2,000 years ago is logarithmic. Archaeologists are yet not able to keep pace

It was interesting that the Tamm and Olivier book includes no sustained discussion of how the changing regimes of time influence the use of digital tools in archaeological practice.

I think it’s important to mention here that I see “digital archaeology” as having two major threads that are not independent of each other: 1) digital archaeology is archaeological investigation of anything facilitated by the use of digital tools, and 2) digital archaeology is the archaeological investigation of digital things, which can include, but is not limited to synthetic worlds, software of any kind, and the firmware, middleware, and hardware used to create, distribute, and allow access to those digital spaces.

This is all the more striking because the Hartog’s change in the nature of the past maps loosely onto our embrace of digital technology to facilitate to documentation and analysis of archaeological field work. One might argue that older techniques of documentation with their dependence on paper sought to create an archive that was designed as much for the future as for the present. We anticipated more sophisticated ways of analyzing our work and sought to document our practices, methods, and assumptions as carefully as possible. The practice of carefully archiving one’s field notes – typically on site – was fundamental to our notion of responsible excavation.

It shouldn’t matter why archaeological documentation is created, only that it is. At the point of its creation, archaeological documentation is of its own time, which can tell future researchers a bit about the conditions of the creation of the interpretation of archaeological data. Archaeological documentation, while of its own time, is also of all time, that is to say that it occupies past, present, and future all at once. Researchers at the initial point of interpretation will author documentation with conscious or unconscious political, social, and economic bias as they work to answer their research questions. They need to bear in mind, however, that while this interpretation is important in the present about the present and the past, that their interpretation will not be the only one. They will not have thought of all of the research questions. They will miss things in the data that only temporal distance from the project’s “conclusion” can yield. Ideally data should be agnostic and amoral, but data are anything but. Archaeologists can, however, write for the present knowing that future generations will revise the work and there’s nothing to be done about it.

More recently, digital tools become a key component to documenting field work. Archaeologists have produced what Andy Bevin has called a “digital deluge” of data from our surveys, excavations, and research projects. The need to archive this data has remained significant, but, at the same time, there’s a growing quantity of “legacy data” that past projects have accrued. The concept of legacy data demonstrates an immediate awareness of the division between past data practices (whether digital or not) and contemporary needs. The expanding discourse on data preservation practices, archival format, and “best practices,” “standards” and meta-data traces our anxieties in the face of rapidly changing technologies and protocols. The fear, I’d suggest, is less about the future of our data and more about its present utility. This follows the increasingly blurred line between the archiving of data and its publication. The potential for re-use in the present has shaped much of the conversation about legacy data. 

All data are legacy data, which includes data created and interpreted today. Any data “preserved” digitally are fugitive. Databases serve the purpose of now, one year from now, and maybe ten years from now. I remember talking to Sebastian Heath about the future of filetypes. Should we be concerned about what types of files we use for data entry, for publication? His opinion (and this was several year ago and might have changed, but I agree with him now and still) was that it didn’t really matter. If we want to access a legacy filetype badly enough, we’ll find a way. But ultimately this all depends on persistent electricity, internet, the “cloud”, and functioning hardware. All are doomed in the long view. So what are we going to do about it? I’d suggest paper versions of record. Super-engraved blocks of permanent material that will outlive every server farm? But then, if data ever survive that long, will future humans and non-humans (including A.I. entities) care? I don’t think it matters. It’s the moral obligation of the archaeologist to record, interpret, publish, and preserve data from any given project with as much care as possible on the unlikely chance that someone 100 years from now will return to it and be able to do something useful with it.

Legacy data, however, is about more than just reuse in the present. In fact, the formats, tools, and technologies that made data collected in the past useful in the past remain a key element to understanding how digital data came to be, how it was encountered, and how it was interpreted. The details about data how a project produced or collected – or the paradata – remain significant, but more than that, the technologies used to produce, store, and analyze data in the past are fundamental to understanding archaeological practice.

I find myself waiting for the publication of the historiography of 21st-century archaeological method and practice, but published in 2020, and not at the end of the century. Such an omnibus publication would surely advance the state of the discipline, prompting conversations about “best practices,” although what’s best for one type of practice might not be best for another kind.

Scholars of video games, of course, already know this. Rainford Guins in Game After: A Cultural Study of Video Game Afterlife (2014) for example, has considered the role of the arcade, the video game cabinet, the art present in a video game cartridge, and the design of the video game console as well as its place in the home. For Guins the game code is just part of the video game as a cultural artifact. He documents the challenges associated with preserving vintage arcade games and the balance between allowing the games to be played and the wear and tear of regular use on cabinets, controllers, and increasingly irreplaceable CRT monitors. The impulse to preserve “legacy games,” if you will, allows us to make sense of these objects as complex cultural artifacts rather than simply vessels for digital code. 

I think video game archaeologists (myself included) continue to fetishize the artifact over its context, and that needs to change, perhaps decentralizing the role of the game itself and instead placing it within a ring-of-context: what forces caused this game to be created, and where does the game slot in with everything else that’s happening at the point of its creation. We study the game-artifact as a way of participating in the greater knowledge-making of the past 50 years.

In an archaeological context, then, legacy data is about more than the code or the digital objects, but also about the range of media, technologies, tools, and practices that made this data possible. Our interest in the utility of digital data risks reducing digital heritage to an evaluation of present utility. If, as Roosevelt et al. famously quipped “Excavation is Destruction Digitization: Advances in Archaeological Practice,” we might also argue that our modern impulse to digitize or adapt legacy data is a destructive practice, “Digitization is Destruction.”

I don’t think so largely because in most(?) cases, the artifact or site once digitized still exists in its analogue form. Lots of copies keep stuff safe, so as long as copies of data are kept and openly distributed at the point of their creation, we theoretically should have “originals” floating around even as other copies are ported forward to other formats for contemporary use.

This isn’t to suggest that we stop engaging legacy data as important sources of archaeological information or that we only engage it using 30 year old IBM PC with a SCSY port Zip drive. Instead, I’m suggesting that our approach to legacy data gives us a useful way to reflect on the changing notion of time in archaeological practice and perhaps even speaks to the complicated relationship between archaeology and heritage practices. 


True, but I worry about the speed at which the recent past creates massive piles of stuff for archaeologists to inherit and inhabit. This “data deluge” can be sanely managed through the use of bucket cores as an analogy for the  sampling of data flows. For example, a game I am studying (Death Stranding) contains human-created items (signs, towers, roads, etc.) that are created and destroyed several times an hour. The archaeologist would need to sit at the screen 24/7/365 to record all that is happening. Now, over time, the same events happen in the same places, although with subtly different placements, volume of creation, and names of creators. Is it necessary for the archaeologist to mine all of the data all of the time, or, in the case of human-occupied digital environments, can one take a sample every day, week, or month, and be satisfied that the sample is representative? I think so, but in doing so we might miss out on those anomalies—a day of no creation, or a day of the creation of something odd/funny. Perhaps by sampling data often over a very long period of time, those anomalies will appear just as part of standard sampling. The only way to find out is to try.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s