The CART theory of Heritage Data

There are two main types of data sets in the Heritage field

  1. Presentation Data,
  2. characterized as being a small data set of interest to a large number of people.

  3. Record Data,
  4. is the opposite, a large data set in which almost nobody is interested, but from which the presentation data is gleaned.

Long Term Storage

Another way that heritage data differs from that generally addressed in data base management models is that this record must be accessible to the world at large and it must be maintained for very long periods (forever). This precludes the use of proprietary formats. In fact it is best if no software at all is necessary to access the data. The model to emulate is an ancient inscription. It doesn't matter what sort of stone was used or what sort of chisel, all that's needed to decipher the contents is knowledge.

If you accept the above as axioms the corollaries below would follow:

  • The data-set must be made widely accessible.**
  • It must be comprised, as far as possible, of non-proprietary file types.
  • Then, within the data set:

    1. Every object should be given an identifier that will uniquely
    2. identify it.

    3. All files must have a unique name such that, without any software
    4. the relationships between it and other files as well as to references in the texts can be discerned.

    An important feature of a structured system is that, if the data is ALWAYS kept in a structured system, a discontinuity is less of a problem. How much material has been lost because the excavators never get around to publishing and no one else can decipher their notes?

    #1 is a theoretical exercise to be carried out in advance and modified as more is learned about the site. Each object type needs to be defined and given a name or code to which will be added a sequence number.

    #2 is the practical aspect of #1. You must take files which got their names from the camera (or which were just made up at the spur of the moment) and, while everything is still fresh in the mind of the recorder, rename them to conform to the schema.

    ** If the filenames are solid making the data accessible is easy. A researcher can figure it out from the names and references to the objects in the text but to make it "widely accessible" requires turning it all into HTML or some other standardized format.

