Curating Excavation Data

Over the past two weeks David Pettegrew and I have been working through the data from the Pyla-Koutsopetria Archaeological Project’s excavation seasons. This includes two seasons of excavation in the 1990s by Maria Hadjicosti with the Cypriot Department of Antiquities and and three seasons of excavation by a team from Indiana University of Pennsylvania, Messiah College, and the University of North Dakota. So far, we’ve prepared the finds tables for publication following the approach that we took with the data from the intensive pedestrian survey at the site published in 2014.

With that more or less under control, we’ve turned our attention to the excavation data which is a bit more complicated a proposition. When we published our intensive survey unit data, we recognized that each survey unit is more or less comparable to every other survey unit. They are all the product of the same procedures, methods, and recording. It is, therefore, useful to query all the data together or a wide range of subsets of that data to interrogate the relationship between surface conditions and artifact recovery rates, densities, and assemblages.

Excavation data, in contrast, is different. Each stratigraphic unit (SU) represents a more complicated set of variables, procedures, and methods that make it very difficult to compare them. For example, a scarp cleaning unit or a plow-zone unit is very different from a unit that is floor packing. The excavation of floor packing in different trenches may or may not be defined the in the same way spatially or procedurally. In one trench the floor packing might be a single SU; in another, the floor packing might be removed over several SUs that are determined only later to represent the same stratigraphic context.

Our description of survey units served to define each unit in a way that allowed us to compare surface conditions across the entire survey area or even between survey areas. Our recording practices in excavation often serve to define our stratigraphic units in a way that is relative to physically adjacent units that represent later depositional events. This isn’t to suggest that we can’t compare units between trenches, but to note that the relative differences between stratigraphic units are often more important in describing the character of a stratigraphic unit.

All this is to say that we’re trying to figure out what information is important to include as searchable, queryable, and sortable data and what information can be left on our stratigraphic unit forms (which will be published as scans). The issue, then, is not whether we’ll publish some information and not publish other information – we’ll publish all of our recording sheets – but rather what we will publish as data and what will remain available, but not presented in a way that can be queried or 

 My instinct is to be fairly minimalist with the information that we present as formal data points. My take would be to publish as data:
* EU (trench number)
* SU (stratigraphic unit number),
* Harris Matrix Relationships (using the basic Harris matrix style terms)
* Description (as a free text block).  

David advocates for a more robust set of queryable descriptors. 

The first group are the same as mine:

* Area
* EU (trench number)
* SU (stratigraphic unit number)
* Description

The next group can be easily pulled from our forms:

* DateExcavated
* MinElev
* MaxElev
* Munsell
* Texture
* Consolidation
* Stoniness
* Dominant Clast

Some of the categories will be included not as textual data, but as links to other resources:

* DrawingNumber
* DrawingDescription
* PhotoNumber
* PhotoDescription

The final group is more interpretive and will draw from the final reports and our published chapter:

* Context (Surface, Plow, Destruction, Floor, Subsurface)
* Phasing in Site
* Summary (from our chapter) 

He does not include Harris Matrix relationships.

The question that I leave for my readers to consider is how do we balance between presenting as much data as might be useful for our audience, and publishing so much data that we allow for unwanted errors to creep in without providing additional utility. 

One Comment

  1. I’m a big fan of queryable fields, because they allow you to search for different combinations of factors (which you may or may not think are important right now) at any point in the future. Also, it’s easier to link those data to other related data (photos, plans, coordinates, related survey finds, etc.). On another note, what about a visual, clickable Harris Matrix, that would allow a reader to navigate through the records stratigraphically?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s