Article Index Project Index Print Version
File Naming Documents


A Database for the Long Term

Author: Steve Nickerson :



How do you record an element of mankind's patrimony, perhaps millennia old, for an audience both current and in the distant future, using tools unthought of a decade ago and sure to be obsolete a decade hence?

Such material needs to be accessible, not only to as wide an audience as possible but for a very long time which is a much more difficult problem. Using electronic media we probably have very little hope of our work being accessible to an audience even 100 years in the future which is a far cry from the 14 centuries we are studying.

Still we are not going to start carving anything in stone (life expectancy: millennia) and probably not much will be put on vellum (life expectancy: centuries), or even computer paper (life expectancy: decades). However we do what we can and that can quite a bit better than a useful life to be terminated with the next release (or demise) of a database program or operating system.

This article will describe our attempts to build a database for the apse mosaics of the Basilica of Eufrasius in Porec Croatia suitable for both our ongoing research and as an archival reference with the additional bonus that it provides easy access via the World Wide Web.


When people speak of a "database" they usually have in mind a computer program of that type whereas the definition in the dictionary is as follows:

1. a comprehensive collection of related data organized for convenient access, generally in a computer.
2. See data bank
data bank
1. a fund of information on a particular subject or group of related subjects, usually stored in and used via a computer system.

ie. It is the data, not the software that presents it that is the "data base", and it is distinguished by its being organized for convenient access.

When data is organized by a database program access can be very convenient indeed, but generally this is the case only for those fluent in the program's structure and commands. Those less comfortable with the system often find it frustrating and give up or learn to perform only a few specific queries. Of course, the material is completely inaccessible to those without the program (including those with a different version of the program, a different computer operating system, etc.).

While the advantages of a database program are generally well worth the learning curve for anyone who will be studying the material in detail such a format is inappropriate as the final resting place for data concerning any resource of historical import. Such material needs to be accessible, not only to as wide an audience as possible but for a very long time which is a much more difficult problem. Using electronic media we probably have very little hope of our work being accessible to an audience even 100 years in the future, a far cry from the 14 centuries we are studying.

Still we are not going to start carving anything in stone (life expectancy: millennia) and probably not much will be put on vellum (life expectancy: centuries), or even computer paper (life expectancy: decades). However we do what we can and that can quite a bit better than a useful life to be terminated with the next release (or demise) of a database program or operating system.

The secret of universality is simplicity.

Suppose that future researchers find our database in an archive somewhere. If it is in a single file (or set of files) from a database program there is very little hope of recreating the data without the program and a computer that can run it.

Suppose, on the other hand, that they find our source material: the image files, text files, CAD files, even simple database files - there will be a much better chance that they will be able to decipher the contents. There may be thousands of these files but each is a relatively simple problem to convert to whatever systems they may be using at the time.

It would not hurt at all if there was a README file describing the material and its organization. Even the printed material would help some, though it will never be as complete as the source material, but only if the paper still held together and the toner had not separated. The point here is that you need to keep everything together.

If one further supposes that the names of all these files conformed to some logical structure it begins to seem possible that some use might be made of the archive.

The creation of our database for the Eufrasian Mosaics has been structured with these considerations in mind and, as a result, has become almost entirely a question of File Naming.

It started with an exercise by the director of the campaign to give a name to each element of the mosaics. names\names-2.gif

As well as defining the primary organization of our data structure this effort, which took only a few hours of, relatively abundant off-site time, dramatically speeded up the data collection and archiving process (precious on-site time) in two ways. First, by having a mostly pre-defined file name there was none of the dithering about what to call this or that new piece of information and second, with a simple sorted directory listing (dir *.* /on/b) we could quickly see what we had and what was missing so there was little duplication of effort and little material that was missed.

We carried this a bit further, into quality control, by building some simple HTML pages to display our images again using a DOS directory listing command (dir *.jpg /on/b > jpg.htm) which gave us a sorted listing of our images in a file JPG.HTM a fragment of which is shown here:

This file was then modified with word processor macros into that which follows names\names-9.gif (I had to change all the "<"s to "["s and ">"s to "]"s to allow it to be seen in HTML)
[IMG SRC="file:///c|\euf\b\b-.jpg"][BR]c:\euf\b\b-.jpg[BR]
[IMG SRC="file:///c|\euf\b\b-n.jpg"][BR]c:\euf\b\b-n.jpg[BR]
[IMG SRC="file:///c|\euf\b\ba-.jpg"][BR]c:\euf\b\ba-.jpg[BR]
[IMG SRC="file:///c|\euf\b\ba-01a.jpg"][BR]c:\euf\b\ba-01a.jpg[BR]
[IMG SRC="file:///c|\euf\b\ba-01b.jpg"][BR]c:\euf\b\ba-01b.jpg[BR]
[IMG SRC="file:///c|\euf\b\ba-01c.jpg"][BR]c:\euf\b\ba-01c.jpg[BR]
[IMG SRC="file:///c|\euf\b\ba-01d.jpg"][BR]c:\euf\b\ba-01d.jpg[BR]
[IMG SRC="file:///c|\euf\b\ba-02a.jpg"][BR]c:\euf\b\ba-02a.jpg[BR]
[IMG SRC="file:///c|\euf\b\ba-02b.jpg"][BR]c:\euf\b\ba-02b.jpg[BR]
[IMG SRC="file:///c|\euf\b\ba-02c.jpg"][BR]c:\euf\b\ba-02c.jpg[BR]
[IMG SRC="file:///c|\euf\b\ba-02d.jpg"][BR]c:\euf\b\ba-02d.jpg[BR]
Once we had a line like this for every file we wanted, it took only acquiring the image and saving it to the correct directory with the correct name to give us an HTML page referencing every image. This we could call up to quickly see what we had and whether it was of adequate quality.

As the number of images grew we chopped these files up into groups of a more manageable size and made an index to allow us to move from one to the other. It is from these simple beginnings that the CARTHTML publisher grew into the tool that created the pages you are now reading.

A year passes during which time I have incorporated this concept into the software that I use to create the CAD models of the structures I record when I am asked to provide a sample of my recording work for a CDROM to accompany the proceedings of a Computer Applications in Archaeology Symposium to which I had presented something totally different. The Basilica was requested because it they thought it would make a better showpiece than the archaeology.

For this exercise I simply took the HTML pages that had been automatically created using the recording software, annotated them and added some help pages and a description of the methodology. The results you can see at:

This publication (new concept that this CD might be a "publication") generated quite a bit of interest from people who would never have a use for the whole recording package so I set about to create a stand-alone module of only those functions that created these HTML pages.

Four months of intensive programming had produced a package that I thought was starting to be quite useful. It did, by establishing absolute standards for the file names, require some changes to both the logic and the practice of what I had been doing heretofore but these changes posed no problem when recording a new project. I seldom revisited the old projects as their pages still worked fine but herein lies the dilemma posed by electronic archives:
When the "publication" (or the medium) needs revision will the data still be accessible for the upgrade?.


Another year passes and I am asked to show something "flashy" to a symposium on Byzantine Mosaics. A good opportunity to test the new software against an old data set.

So I set about converting this earlier database, created using a very similar (though not identical) logic to the standards demanded by the new software. I have documented the process because I also needed to prepare a lecture on low level data manipulation.

The first step was to take a fresh look at what I had to start with. This included more than 800 images in more than 50 sub- directories as well as the recording material - geometric database, CAD files and a small amount of textual information in a word processing format.

To simplify this exercise I added a new feature to the publisher which, when there are no files matching the cover page criteria, creates a page of links to the individual files. These links only work on the originating computer but the list looked like this (for comparison the final list looks like this)

The next step (after making a full back-up of the original) was to get rid of a bunch of the files created by the older system, specifically the old thumbnail images which were distinguished in the old system by having a dash as the 3rd character.

This was done by gathering a directory list with the command:
dir \euf\??-.* /a-d/on/b/s > 1-delete.bat
which creates a directory list like:

which can be quickly changed with a global search and replace ("P:" into "del P:") into a batch file that looks like:
del P:\euf-1\b\b--.jpg
del P:\euf-1\b\ba-.jpg
del P:\euf-1\b\be-.jpg
One does have to look at the items and perhaps rescue a few files that were misnamed but the whole exercise takes only a few minutes.

Then I created another batch file dealing with everything that ended in a "-" with the command:
dir \euf\*-.* /a-d/on/b/s > 2-delete.bat
and another global search and replace. This file did pretty much the same thing, getting rid of redundant files that were created by or for the old system which would either not be needed by the new one or would be recreated by it.

Then one more which collected everything in the root directory
dir /on/b > 3-delren.bat
which allowed me to delete or rename the rest of the files created by the old system that would confuse the new one.

So far I had been working from the old HTML pages as that was the cleanest source of the several hundred image files that make up the bulk of the data set. The new system will consider a bunch of file types that the old one would not so I could now add these to the new set and would no longer have to manually convert them before running the publisher. The command:
dir \euf\*.* /on/b > 4-add.bat
run in the old source directory filled in these gaps (with a little manual editing).

The next set of batch files had to do with some changes I wanted to make to the structure of the pages. The first
dir euf*.* /on/b > 5-rename.bat
was used mostly to separate the descriptions of the CAD measuring system from the configuration files and source drawings of the measurement system itself. I could have achieved the same results, as far as the HTML pages were concerned, by skipping the previous step but then I would have to put them back again if I was ever asked to go finish the recording job.

Then I needed to create some new separations to deal with naming inconsistencies that had started to surface. What had happened was that long skinny stuff, like the jewelled bands and the inscription, had been sub-divided lengthwise instead of in the grid pattern used for the more rectangular elements and their file names needed to be altered to allow them to be grouped in the same way as the others. This was accomplished with:
dir e\*.* /on/b > 6-fix-e.bat
dir t\*.* /on/t > 6-fix-t.bat

However, once that was done another problem came to light, one that had been solved for the files above.

I was running out of group codes.

The software creates its HTML pages by gathering files by name based on a code supplied by the user and we had defined a code starting with every letter in the alphabet for our basic elements. For instance the files I was preparing for this paper needed to have names, I was using name-1.txt, name-2.gif, etc. which would conflict with one of the Intrados, but whatever I called them they would conflict with one of the element names. My adjustment of the files above had added those groups to this problem by eliminating the delimiter they had previously had as a final character. What I needed was a delimiter at the end of every group.

So I created:
from a directory listing of the entire data set and a somewhat more complicated word processor macro which created two parallel lists of names to which I then manually added the "-" at the appropriate location for each file. This is a lot of manual editing, taking perhaps 15 minutes, but you can learn from my mistake and think about delimiters in the first place.

And then I had to do something similar to the long skinny stuff I had"fixed" earlier with:
dir t /on/b > 8-rename.bat
There were a couple more tweaks of the file names - separations that had to be made to allow the recording system to inhabit the same directory structure without conflict and groupings for which I simply no longer liked the names. They were created with commands like:
dir euf*.* /on/b/s > 9-rename.bat
and there have been a few changes since, things like:
which gathered some images in support of the page on the nails.

The whole exercise might have taken a half a day. It would have been better to get it right the first time, which would be easy now by using this software from the outset, but what ever system you settle on the requirements will be different later, just as it was for me using data only 2 years old. Still it was a relatively painless exercise compared to what it might have been but only because the earlier database had some sort of organization at the level of the file name.

A screen capture of a fragment of the WordPerfect document that is saved as
an HTML file capable of displaying images for quality control.  Missing
images show a  and poor images look poorly.


A Database for the Long Term


Computer Ready Records Ensuring that the records you collect will be accessible in a computer environment

Object Oriented Recording First thoughts on file names

Site Information System More thoughts on file names

The (non-virtual) Reality of the Heritage Record

The publisher itself

Files Referenced for name
  • p:\euf\names\NAME-0.ABS
  • p:\euf\names\NAME-1.TXT
  • p:\euf\names\NAME-2.TXT
  • p:\euf\names\NAME-3.TXT
  • p:\euf\names\NAMES-2.GIF
  • p:\euf\names\NAMES-3.GIF
  • p:\euf\names\names-4.gif
  • p:\euf\names\NAMES-5.GIF
  • p:\euf\names\Names-9.gif
  • p:\euf\names\NAMES-9.CAP
  • p:\euf\names\name-9.bib

  • Up Index Cover

    Inquiries to:
    Generated Sat Jul 10 18:49:25 1999 by: CART Computer Aided Recording Tools