The CART theory of Heritage Data
There are two main types of data sets in the Heritage field
- Presentation Data,
characterized as being a small data set of interest to a large number of people.
This is best represented by the idea of the museum which takes a small
sub-set of it's collection and assembles it into a presentation designed
to tell a story to a large number of visitors.
is the opposite, a large data set in which almost nobody is interested, but from which
presentation data is gleaned.
Record data is the source material, gathered in the field or from other
archives. Gathering this material is the most important link in the
chain and it is the most difficult. The people who gather it are almost
always working under constraints of time and resources and every effort
must be made to make it as easy as possible for them to get their data into the
system (any system).
Long Term Storage
Another way that heritage data differs from that generally addressed in
data base management models is that this record must be accessible to
the world at large and it must be maintained for very long periods
(forever). This precludes the use of proprietary formats. In fact
it is best if no software at all is necessary to access the data.
The model to emulate is an ancient inscription. It doesn't matter what sort
of stone was used or what sort of chisel, all that's needed to decipher the contents is
If you accept the above as axioms the corollaries below would follow:
The data-set must be made widely accessible.**
It must be comprised, as far as possible, of non-proprietary file types.
Then, within the data set:
- Every object should be given an identifier that will uniquely
- What is it? (wall, coin, bone, book, etc.)
- Which is it? (Usually a number unique for that "what")
- Where is it?
- In a survey this would be one or more 2D or 3D co-ordinates.
- In other situations it may be an address, a contact name, or the ISBN of a book.
All files must have a unique name such that, without any software
the relationships between it and other files as well as to references in the texts can be
- For example: a table exported as a .CSV may reference an object
named "what_001" while photographs of the object would be called
"what_001*.jpg", the drawings "what_001*.tif" and a text description of
it "what_001.txt". There may also be a "what.CSV" which would be the export of a
database containing information on all objects of the type "what".
An important feature of a structured system is that, if the data is
ALWAYS kept in a structured system, a discontinuity is less of a
problem. How much material has been lost because the excavators never
get around to publishing and no one else can decipher their notes?
#1 is a theoretical exercise to be carried out in advance and
modified as more is learned about the site. Each object type needs to
be defined and given a name or code to which will be added a sequence
#2 is the practical aspect of #1. You must take files which got
their names from the camera (or which were just made up at the spur of
the moment) and, while everything is still fresh in the mind of the
recorder, rename them to conform to the schema.
** If the filenames are solid making the data accessible is easy. A
researcher can figure it out from the names and references to the
objects in the text but to make it "widely accessible" requires turning
it all into HTML or some other standardized format.
Files Referenced for "theory"
Generated by: CART
(Tue Mar 08 12:38:43 2005