Notes: Diaries, On-line Diaries, and the Future Loss to Archives by Catherine O'Sullivan

Published in American Archivist, Vol. 68, No. 1, Spring/Summer 2005, pp. 53-73.

One thing that has become clear in the short time since I was brought into the archival fold, is that online (not merely electronic, mind you) information really puts old-guard archivists into a bind. Having been to one annual conference of the Society of American Archivists, I feel that this apprehension is palpable – almost like the tension amongst people who do not wish to discuss the rather large and smelly elephant in the room. And with good reason. Hypertextual resources (Web pages, databases, and the like) dissolve the boundaries between groups of information by removing the physical constraints that have for so long defined information access. For someone who has defined their career's work in terms of discreet information objects of a physical nature, this lack of physical structure is likely to induce a sort of intellectual nausea.

Not that all archivists are cranky old dead tree advocates, but in my opinion, the momentum towards the issues inherent in the preservation of online information has yet to build up. The crux of the problem, and the solutions that are to be devised, lie in a common area -- technology. Having come from the Web design and applications development field, these challenges are still formidable, but I am not intimidated by the technology itself. A greater understanding of the technology is needed in order to make progress in preserving the information products contained therein. (much more is required than merely understanding the technology -- intellectual property is one of the big ones -- but I digress)

It is to this end that O'Sullivan seems to approach. Much has been made about the influence of blogs in media and other social and political venues. From the perspective of a long time online information users, this hardly seems worthy of note; After all, BBS systems, USENET, and Web message boards have been around "forever." So what makes blogs so special?

O'Sullivan takes a literary approach to analyzing blogs, likening them to written diaries, which provides a clues as to why blogs are different. Her research into the styles and types of diaries over the centuries indicates a form of information that has changed from that of a highly formal, often religious imperative, to that of casual social observation of a very personal nature. Bringing blogs back into the picture, this is part of the hype that the ephemeral nature of email lists and BBS's were unable to capture – the highly connected, personal observations so prevalent in blogs and online journals. But that is not all. The main observation here is that blogs are the literary descendant of diaries, an assertion which is verified by the availability of non-technical tools for publishing them (like Drupal -- touché). Just as paper diaries only require knowledge of written language and simple technology (i.e.: paper and pen or some analog thereof), blogs also require language knowledge (in some cases merely spoken, as is allowed by some experimental blog transcription services) and simple technology, which these days includes internet access and computer with browser – all freely available at your public library or on the cheap in numerous other ways. The diary metaphor is certain to provide much comfort to heretofore squeamish archivists -- it is a boundary, after all, that is still quite elusive, but can be grasped without too much knowledge of the underlying technology.

Having made the link between old media and new, O'Sulivan moves on to describing the magnitude of information wrapped up in blogs and some discussion of similarities and differences between the two. Using a brief explanation of the ephemeral nature of blogs, and thus the threat of a great loss of cultural information, she segues into the question of preservation. As can be predicted, the Internet Archive is brought up, to which she quickly acknowledges the WayBack Machine's inherent (ironically) archival limitations. From this, she retreats to the comfortable territory of acquisition and appraisal, that is, leaving the problem up to individual, existing collections to decide which resources are worth preserving in order to keep a more complete record, including much of the context that surrounds, and lends authenticity to, the resource. Such tactics continue, however, to support the retention of only what is deemed at the time to be "important" -- which inevitably leads to preferential treatment for high-profile resources. I might as well state at this point my undying devotion to the idea of a balanced, complete cultural record, not merely of the rich and famous. Besides, as a wise woman once challenged me, since storage is inevitably going to be cheap as air, why not keep it all? Indeed, why not?

It is here that I diverge from the reductionist view, that is, narrowing our focus to one type of online information resource, and throw out this proposition: in order to gather a complete record of the global internet, we are going to have to throw open the gates of archival process and make it available to the masses. Do I expect archival theory to become familiar to the common person? Absolutely not. What I propose is that the systems we use, and the technology of archival storage, be made easier to grasp to the common person, and as with blogs just as easy to use. These are no small tasks (the information retrieval and integrity aspects are immense in and of themselves), but they are what I have defined as a life's work for myself.

Perhaps this will be distasteful to the archival establishment, but this proposition might imply a break with formal diplomatics since we aren't talking about property records or a succession of command, but the reconstruction of a place and time with implicitly less dire consequences for imperfection serving considerably less skeptical needs. The lives of common people and organizations can contribute greatly to the understanding of the context in which greater political and social shifts occur. As our cultural record moves online, the imperative to preserve a wide swath of it increases as well. Technology alone will not provide all the answers, and O'Sullivan rightly brings together the new and the old in order to more fully engage the archival mindset in the problems at hand.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Many thanks for sharing your

Many thanks for sharing your comments on my blog article. I would like to respond to three points that you make in your entry; each one relating to the capture of blogs. You write:

"....she retreats to the comfortable territory of acquisition and appraisal, that is, leaving the problem up to individual, existing collections to decide which resources are worth preserving in order to keep a more complete record...Such tactics continue, however, to support the retention of only what is deemed at the time to be “important” -- which inevitably leads to preferential treatment for high-profile resources."

Every archival repository has a collection development policy to which it adheres when building or maintaining its collection. It is basic archival practice. For instance, here at the ACLU, we only collect those records that are received or created by the ACLU during routine ACLU business. We do not collect the records of the NAACP or Human Rights Watch because they do not fit within our collection, not because we do not deem the work of those organizations important. If someone is building a Gay and Lesbian Archives, it would make sense that she attempt to capture the accounts of gay and lesbian bloggers, or blogs relating to gay and lesbian issues rather than the blogs of horticulturists or ornithologists, unless there is something about a particular horticulturist's or ornithologistsblog's blog that contributes to a Gay and Lesbian collection.

You state: "I might as well state at this point my undying devotion to the idea of a balanced, complete cultural record, not merely of the rich and famous."

And thankfully we are seeing more and more archival collections that counterbalance the ever-abundant archival record of dead wealthy white men. For instance, there is now an archival collection relating to squatters living in New York City's Lower East Side. I hope there is a squatter out there keeping an online account of his experiences squatting in Lower East Side buildings; it would be a tremendous asset to that particular archival collection.

You then add: "Besides, as a wise woman once challenged me, since storage is inevitably going to be cheap as air, why not keep it all? Indeed, why not?"

It’s simple, because unlike air, real estate, new technology and workers are not free. Archives are not known for their sizable budgets. It is very expensive to house an archival collection, manage a reading room, purchase the computer hardware and software needed to collect, maintain and access digitally born records, not to mention, hire the staff who will carry out the related work. It would be nice if it were free!


Greetings, Catherine

I apologize for the delay in publishing and responding to your comment. As it turns out, you have the distinction of being the first person to comment in my blog, and as such, have the distinction of having been subjected to my having to figure out how the comment settings work (or don't, in the case of notification).

And, as I'd hoped, the comment process has initiated a great deal of though and reflection on my part over the last few days, which I hope to express as concisely yet clearly as possible.

Metaphorically speaking, I believe we are both scaling the same mountain, but differ in our approach. Having looked back over my post and your paper, I realize that my position with regards to preserving digital information was not fully articulated, and may yet remain so for a bit as I resolve some important questions – something that I will address shortly.

But first, let me express my tacit agreement with the approaches you mention. I do not for a second believe that traditional archival practice and collections maintenance is at stake. These approaches serve us well for preserving and providing valuable context to all types of media within the well defined domains for which collections are established. It is in this context that the allocation of real estate, personnel, and other such resources makes perfect sense and where the economics of the "save everything" argument fails.

But I am concerned about how the massively interrelated set of information we call the Internet (and the blogs within) can be distilled on the basis of these limited resources and yet maintain an adequate representation of what exists (let me point out the obvious here in that I don't expect that every single blog or Web page is worth saving). Certainly this is achievable for the blogs of individuals or those treating in narrowly focused topics, but, the process of condensing, for example, the online component of national or international political discourse, or the wider perceptions and reactions of events like 9/11/2001, using traditional appraisal by a limited number of organizations is bound to result in an incomplete record – any one organization will be overwhelmed by the task of assessing and reviewing the innumerable potential sources, and the efforts of many separate collections will inevitably omit points of view that are not represented within their established domains or have not yet been discovered. This leaves us with the hopes that the original creators will somehow provide for the preservation of their own works. But unlike written diaries that can remain hidden for decades, I doubt that, in the future, our descendants will discover many neglected digital artifacts that could alter their understanding of the present.

Preserving digital records is a daunting task no matter what the approach, and I wholeheartedly agree with you that we should start now with the resources and methods we have. In fact, I just came across an blog post from the RLG describing a toolkit offered by the Internet Archive as a Web preservation tool that would seem to support your call to action. But there are some technologies that I feel may result in new paradigms for saving digital information. At the highest possible level of abstraction, I am talking about technologies that leverage the network itself – a resource with far more computing power and storage space than any number of separate organizations – as a preservation medium. This is not mutually exclusive of institutional repositories and can be viewed as a quality-of-service framework, with institutional repositories constituting a reliable backbone and the greater network providing a body of preserved information from which to draw upon. Though I have not yet achieved a full description of this framework, I will say that it is inspired by technologies such as FreeNet, CleverSafe, and LOCKSS, as well as the profusion of peer-to-peer applications. There are also larger research efforts such as the storage resource broker behind the PAT project and the nascent XRI identity infrastructure that support the idea.

The roadblocks I have run into are not so much about authenticity, reliability, context, or resources, but are centered on long-term preservation concerns. Institutions, despite having limited resources, have the structure and organization to handle migration and other preservation efforts. But, at this point I have not yet thought of a way that a decentralized infrastructure can ensure preservation of the information stored within. I do sincerely believe that we can save most of what is out there that is worth saving, but I concede that it is still a theory. So, it seems you have caught me unprepared – or, perhaps, I was shooting my mouth off too soon (I did say "life's work," after all).

Thank you for taking the time to comment – this is exactly the effect that I was hoping for when I established this blog. Your response forced me to analyze my ideas more closely which has been quite helpful.