My post today is mostly for data nerds (or want-to-be data nerds, in my case). For the last two months, I’ve been messing around with some databases from the Michigan State Excavations at Isthmia in Greece. I have any number of goals with doing this and most of them loosely coalesce around “figuring out how good this data is and whether I can do anything with it as it is now.” Recently, though, I’ve gotten tired of waiting to see if the data is good enough and started to tinker with it a bit as a way to see if I could build some hypothesis and find problems with the data through testing it.
The biggest challenge with this data is folding together five different datasets and getting them to talk to one another.The first data set is the context pottery read at Isthmia over the years. This consists of pot sherds that aren’t special enough to be inventoried but can nevertheless be identified as from a particular class of vessel. The second dataset is the “lots” dataset. This is a list of lots – or stratigraphic units – excavated over the years at Isthmia. It’s hard to know whether it represents ALL the lots excavated or just some of them. Most of these lots also have locations (that is areas at the site) as well as trenches and many have been assigned dates.
The other three data sets for my current experiment consist of inventoried pottery and lamps. Two are the Byzantine and Roman pottery deemed special enough to warrant formal cataloguing. Some of this material formed the basis for Jean Marty-Peppers 1979 Penn dissertation. There is also a database that lists the inventoried lamps which have recently been published by Birgitta Wohl. Part of the larger goal of my work is to make sure these datasets “talk” to the publications. More importantly, however, I worked to assign each of these 3000 or so artifacts to its appropriate lot (or stratigraphic context). This would ensure that these datasets could “talk to” the lots and context finds databases. This is work in progress because sometimes the lot isn’t very clear and I’ll have to dig into the notebooks at some point to make sure that these datasets talk to each other well as I can.
I was able to kluge these datasets together with only a little bit of fuzziness between them (for example, some of the lamps come from deposits that may [or may not] be identified as lots). Some of our standardized vocabulary for artifacts (we have adapted a version of the chronotype naming system) isn’t quite tidied up yet as well. So there’s some more work to do.
I can however offer some very simply examples for how this work might be useful.
One of the first datasets that I wanted to explore involved our assemblage of Slavic pottery. Slavic pottery is a shorthand for hand or slow-wheel made cooking pots and beakers with geometric decorations. They may be associated with “Slavic” migrants to the Southern Balkans, but this remains a bit of an open question. Generally this material dates to the 7th or 8th centuries (or later).
We are now able to quantify our Slavic material in some mildly interesting ways. For example, we can now say that we have 132 contexts that contained some Slavic material. 28 of these contexts produced inventoried finds and the rest of the Slavic material came from context pottery. Slavic ware appeared in almost every context from across the site. Conventionally we’ve associated Slavic ware with depositions found in the area of the Roman bath, but, it turns out that a very narrow majority of Slavic sherds came from other locations around Isthmia (52.4%). We can also use the dates assigned to lots (based on?) to get a broad sense of the character of the contexts in which Slavic ware appears. 48% of it appears in Late Roman and Early Byzantine contexts (which appear to date from the 4th-8th century). 24% of it comes from later Medieval contexts (e.g. Middle and Late Byzantine contexts) and about 26% comes from mixed contexts. Of course, it is hardly surprising that most common location for Slavic pottery in in Early Byzantine and Late Roman contexts associated with the Roman Bath (29%), but the Early Byzantine and Late Roman as well as mixed deposits associated with the Northeast Gate produced not insignificant quantities of material. That over 20% of the Slavic material came from the Northeast Gate is interesting and I’m eager to dig a bit more into this.
It’s not a massive leap from this to a study of the larger assemblage associated with the Slavic material. Of course, for this to be meaningful, we also need to study the excavation context, but this is a future project.
Working with Isthmia data has also allowed me to start to think about the distribution of material across the site in different ways. David Pettegrew and I have been thinking a bit about how we might compare the assemblages produced by excavation at Isthmia with those produced by the intensive pedestrian survey of the Eastern Korinthia. David has published the data from the latter which makes it a convenient dataset to explore.
For now, however, I was content to explore the data from the Isthmia datasets across the site. There are some provisos, of course. First, I have no real sense how complete this dataset it. On the one hand, I expect that if it is not complete, it is incomplete in an effectively random way. On the other hand, the dataset might be lacking non-stratified assemblages that might contain material from later or earlier periods. As a way to offset this, I decided, just as a little experiment to compare the distribution of common Roman fine wares at two areas of the site: The East Field and the Roman Bath.
The results are pretty boring, but could inspire some hypothesis building. For example, maybe it’s worth noting that the East Field and the Roman Bath produced roughly equal amounts of the long-lived and common African Red Slip ware (28%-30%), but the Bath produced more, later Phocean Red Slip (LRC) ware (16% to 3%). The East Field, in contrast, produced more Candarlı Ware (12%) than the Bath (8%). At first, I suspected that this was because the East Field less later material than the Bath which, if I recall, remained in use as an activity area later, but the Bath assemblage had a higher proportion of ESA/B wares (32% to 28%) which tend to have earlier Roman dates. Both areas produced a good bit of something called “Class H” pottery. In fact, it was the plurality of the assemblage from the East Field (34%) and consisted of 15% of the material from the Bath. I have no idea what this is and it is not mentioned in Jean Marty-Pepper’s dissertation or the recent Hayes and Slane volume of Roman pottery from Isthmia. In other words, its some kind of fine ware that is not included in the two most recent (if 1979 is indeed recent) and thorough catalogues of material from Isthmia, which is a bit odd.
As we continue to refine the data, it’ll become possible to perform other kinds of comparisons between material associated with various areas of the site sites. The kind of legacy data produced by Isthmia probably is not sufficiently robust to constitute “big data,” but my hope is that by slowly cleaning it up, it’ll help us at least make some new connections and pose some new questions.