What's in a Creation Date?

There is a certain perception that often accompanies digital objects and, more broadly, computer systems as a whole. This sort of perception manifests itself when, for example, we hear about how massively compressed digital MP3 files are considered to be "perfect" quality audio or in similar myths concerning the infallibility of all things digital. These perceptions are based on incomplete or inaccurate assumptions about how software, operating systems, or file systems function. My favorite way of stating this is that computers are only as smart as those who designed them – if to err is human, then the same goes for our electronic creations.

When making the transition from paper to digital records, these assumptions are likely to appear in unexpected places. While working on the Joyce collection, we ran headlong into one of these assumptions, made a note of it, then moved on. But I promised that I would look closer at the issue at a later time... so here I go.

Anyone with a modicum of computer literacy is familiar with managing digital files through some means -- be it command line or GUI -- and has been exposed to the fact that the computer's file system(s) maintain not only filenames, but various dates such as creation date, modification date, and access date. At first blush, this seems like a godsend for archivists struggling to put concrete attributes on virtual objects. Certainly these dates mean what they say – the creation date is the date it was created, etc. -- and these attributes follow the digital object wherever it goes, correct? Unfortunately, a little investigation sheds some doubt on the subject.

I devised a simple set of experiments to confirm or deny the assumption that all filesystem date metadata is the same and means what we assume it to mean. I selected the three major operating systems in use today, Windows 2000/NT, Macintosh OS X, and Linux, and conducted a variation of the following sequence on each:

Create an arbitrary text file in an arbitrary location on a local hard drive (volume)
Modify the text of the file and save it to the same location
Move the file from one directory to another on the same local volume
Make a copy of the file to another directory on same volume
Copy the file to a separate volume

After each step, I gathered date information from the filesystem (e.g.: creation and modification dates) and generated an MD5 hash to confirm whether the contents of the file stayed the same or changed. I will now discuss the details of this experiment for each operating system.

Windows 2000/NT

Action Creation Date Modified Date MD5 Hash
Created 07/27/2006 19:48:43 07/27/2006 19:48:43 d41d8cd98f00b204e9800998ecf8427e
Modified and Saved 07/27/2006 19:48:43 07/27/2006 19:50:05 0aa9bd7d122205a12e939f14d6946c14
Moved from one directory to another 07/27/2006 19:48:43 07/27/2006 19:50:05 0aa9bd7d122205a12e939f14d6946c14
Copied to another directory on same NTFS volume 07/27/2006 19:52:14 07/27/2006 19:50:05 0aa9bd7d122205a12e939f14d6946c14
Copied to another NTFS volume 07/27/2006 19:53:25 07/27/2006 19:50:05 0aa9bd7d122205a12e939f14d6946c14
Table 1: Windows 2000 (NTFS)

Table 1 above shows a tabulated view of the experiment's results for a Windows computer using the NTFS file system. File dates were collected from the Windows file properties dialog, while MD5 hashes were generated using a freeware program called HashCalc. The first two steps passed as predicted, with the new file correctly showing a change in modification date and MD5 hash. The third step shows that Windows considers a moved file on the same volume to be the same before and after the move – again, this makes sense.

Upon making a new copy, however, common sense starts to break down. The modification date stays the same as before the copy – demonstrating, as the MD5 hash confirmed, that no changes have been applied – but the creation date has changed to the time of the copy operation. This simultaneously makes sense and is confusing: we now have a new copy of the file, with its own creation date, but now the modification date precedes the creation date, which flies in the face of common sense. How can a file have been modified before it was created? But it does not end there. Upon copying across hard drives, the creation date is again modified, once again bringing up the creation/modification dichotomy.

Macintosh OS X

Action Creation Date Modified Date MD5 Hash
Created 07/27/2006 20:04:00 07/27/2006 20:04:00 a53165315d1e86c5739d34e1243f5f4d
Modified and Saved 07/27/2006 20:04:00 07/27/2006 20:07:00 cb697c6c073f85c43e2dfb100f5b725e
Moved from one directory to another 07/27/2006 20:04:00 07/27/2006 20:07:00 cb697c6c073f85c43e2dfb100f5b725e
Copied to another directory on same HFS volume 07/27/2006 20:04:00 07/27/2006 20:07:00 cb697c6c073f85c43e2dfb100f5b725e
Copied to another HFS volume 07/27/2006 20:04:00 07/27/2006 20:07:00 cb697c6c073f85c43e2dfb100f5b725e
Copied to FAT (MS-DOS) volume (OS X view) 12/31/1903 16:00:00 07/27/2006 20:18:00 cb697c6c073f85c43e2dfb100f5b725e
Copied to FAT (MS-DOS) volume (Windows view) 07/27/2006 20:18:59 07/27/2006 20:18:58 cb697c6c073f85c43e2dfb100f5b725e
Copied back to HFS volume from FAT volume 12/31/1903 16:00:00 07/27/2006 20:18:00 cb697c6c073f85c43e2dfb100f5b725e
Table 2: MacOS X (HFS)

Table 2 above shows a tabulated view of the experiment's results for a Macintosh OS X computer using the HFS+ file system. File dates were gathered using the Finder's Get Info command and MD5 hashes were computed using the built-in command line program, "md5" (Note here that the Get Info dialog does not show seconds in dates). Here we see behavior that conforms to our assumptions – the creation date follows the file throughout its movements on the machine and the modification date is only changed when an actual modification is made.

With a little extra knowledge of how Macintosh file systems work, however, it is understood that each file is actually a pair of files: the resource fork, which holds metadata about the file, and a data fork, which holds the content of the file. Many file systems do not respect this dyadic system, which can create problems when Macintosh files are exchanged with other operating systems or through network transfer. To this end, I conducted a few more steps that involved transferring the file to a non-HFS+ volume (in the form of a FAT formatted USB flash drive) and viewing the transferred file in both Macintosh and Windows environments. As you can see, the both dates were significantly affected. The transfer was considered to be a modification, thus changing the modification date, but then the creation date became skewed. Macintosh could not recognize the creation date, instead displaying the Macintosh epoch, while Windows interpreted the creation date as being the same (but off by one second, for some odd reason) as the time of the copy operation. Upon copying the file from the flash drive back to the HFS+ volume, we see that the dates were preserved as changed during the transfer to the flash drive.

Linux

Action Change Date* Modified Date MD5 Hash
Created 07/28/2006 00:12:28 07/28/2006 00:12:28 d41d8cd98f00b204e9800998ecf8427e
Modified and Saved 07/28/2006 00:13:43 07/28/2006 00:13:43 93ad68660a99d36a665a553672a8148d
Moved from one directory to another 07/28/2006 00:14:38 07/28/2006 00:13:43 93ad68660a99d36a665a553672a8148d
Copied to another directory on same ext2 volume 07/28/2006 00:15:51 07/28/2006 00:15:51 93ad68660a99d36a665a553672a8148d
Table 3: Linux (ext2)

Table 3 above shows a tabulated view of the results for a Red Hat Linux computer using the ext2 file system. File dates were gathered using the "stat" command and MD5 hashes computed using the "md5sum" command. Already there is one glaring difference between the Linux results and the previous two, that is, the non-existence of a creation date. Instead, I have shown the Changed date (status change or ctime) as reported by stat. It was difficult to determine the reason for this omission, especially since some references incorrectly referred to the Changed date as the creation date (e.g.: Poirier, 2001), but I found an email discussion thread that helped to clarify some of the reasons. In short, the creators of the ext2 filesystem, and Linux in general, deemed the concept of a creation date as being too nebulous to model, so they omitted it. The Windows experiment demonstrates some of the potential issues behind the concept of a digital creation date and lends some legitimacy to the decision to omit, even if it does seem a bit unsettling.

Continuing with the experiment anyway, we can see how the modification dates and hashes behave in the way the Windows operations did when modified and moved. Copying the file, however, altered the modification date, which is different behavior than the other two operating systems. Additionally, we see how the Changed date is updated with each action, regardless of the effect on the content of the file. It is worth noting that the Changed date may also be updated when using seemingly content-neutral commands such as grep and find. In this way, the Changed date acts more like an Access date and lends very little help to archival processing.

Analysis

Each of these experiments shows how the assumptions of the software makers dictates the behavior of what seem to be common sense concepts, thus threatening the validity of assumptions we make while using them. In the case of Windows, the assumption is that any copy operation creates a new file and is treated as a new object, but leaves behind a paradoxical situation where the modification date precedes creation. In Macintosh, every copy of a file, so long as it is made on a compatible volume, can be traced back to the original object by creation date – in essence, every copy of a Macintosh file is simply a new version, not a new object. Linux, on the other hand, repairs the Windows dichotomy by bringing the modification date forward with each new object instance.

At the surface, we may want to proclaim that filesystem metadata cannot be trusted and debate the merits of ignoring it completely. This is understandable, but perhaps a bit hasty. It might be better to consider filesystem metadata as helpful to the extent that it has been properly maintained during he record's lifetime. Since creation and modification dates support authenticity it only seems fitting that our treatment of their apparent flaws should derive from similar concepts. In other words, the lessons of this experiment should not only guide the handling of digital objects in a repository setting, but in assessing the reliability of filesystem metadata as generated in the originating environment. If the recordkeeping systems that generated the digital objects, including policies and documented procedures outside the systems, if any, can be assessed, then the metadata accompanying the objects may be salvageable. Without such knowledge, though, it is wise to treat any and all filesystem metadata with prejudice.

Even with a thorough knowledge of the originating environment, can we trust dates and times as the filesystem reports them? Certainly a to-the-second time should be taken with skepticism -- time zone settings, variations between computers and clock drift ensure that the only way exact times can be compared is within the same system. But beyond that, dates may even prove fallible: unskilled users may neglect to set the system clock correctly, or miss a daylight savings shift. Further, power outages or system failures can have detrimental effects on the system clock and, in the case of Macintosh systems where the file metadata is stored as one of the two file parts, metadata may become corrupted just as normal data files can. All of this goes without mentioning date errors deliberately created by knowing users in order to deceive or conceal -- forgery is always a risk.

These problems should demonstrate that the skills of an archivist in determining the authenticity and reliability of records do not fade away in a digital environment, but that the means of performing these tasks change. An intuition and knowledge of the assumptions underlying the technology is key, as is a thorough understanding of the origin of the records – the latter being a skill that archivists already possess. Hopefully this experiment will help to increase the skills of the former.