Neil Beagrie on Personal Digital Libraries and Collections

I finally got around to reading Neil Beagrie's D-Lib article, "Plenty of Room at the Bottom? Personal Digital Libraries and Collections" (June 2005), and I regret not having done so sooner (alas, I have a great deal left in my "to read" folder). This article touches on several major themes in my academic pursuits of the last few years, which I will briefly describe here.

What drew me to the archival field was the overarching concern I have about the potential loss to our external memory in the sense of our information bearing objects. Being firmly seated in the digital generation, my concern is mostly over digital materials, and having completed my information science degree I find that, though I still worry about our institutions' digital preservation efforts, it is the enormous amount of personal digital information that people the world over possess that really worries me. Beagrie's article attacks this issue head-on, naming this body "personal digital collections" and enumerating not only the threat of loss, but the challenge these non-traditional collections pose to our "memory institutions."

Personal digital collections are subject to the same threats to persistence that the large institutional and academic projects are – obsolete formats and media, access regimes such as passwords and DRM, and so on. Beagrie also enumerates missing data as a threat, with the parenthetical "email, webpages, etc." It seems that he means links to web pages, references to emails that have been deleted and so on, but I also wonder if mere information mismanagement is also intended? A recent episode in my own personal digital information management should elucidate.

As part of my ongoing audio encoding project, I have been preserving some of my own audio works from the last decade. I have also been checking my music collection, including these personal works, against an online discography database, Discogs.com. Every release in the Discogs database represents a physical object (CD, LP, etc.) released by a specific entity (record label), and lists not only the track information, but catalog information, liner notes, and cover art. As you can probably guess, the music that I created and released was not widely known or distributed (I still have a day job), so naturally there were no previous entries in Discogs.com. In the process of updating the database with my defunct label's releases, I found to my horror that I had lost some of the original digital files containing artwork and layout for some of my releases! Granted, I have not always been preservation-minded, but I had always assumed that these files were migrated from computer to computer over the past decade. Certainly lapses of this sort pose a significant hazard to personal digital collections, and I'm sure that it qualifies as "missing data."

Interestingly enough, my Discogs example also touches on Beagrie's discussion of "information banks." Although Discogs does not store the actual information represented in its indexes (the music), it is easy to visualize how it could were it not for the copyright regime so voraciously defended by the music industry. This worn argument aside, Discogs does implement a social networking component of the likes proffered in Beagrie's discussion of information sharing services such as blogs and sites like Flikr. By adding a social networking component, all of these sites, whether they publish unique user content or merely aggregate collected information (like Discogs), add a layer of informational value in the form of contributed information (e.g.: blog comments) or linked information (e.g.: relationships between artists in Discogs). But perhaps more importantly, the creation of these information banks, whatever their form, supports my assertion that digital preservation efforts must be aggregated at some level beyond a single (physical) entity's capabilities -- that only distributed efforts will ensure that digital assets are adequately preserved and accessed, let alone described and identified. This is as true for the National Archives as it is for Joe Q. Public's personal works.

As an aside, I could not help but notice that all of the talk about social networks and personal collections seemed to echo writings on digitally mediated identity by Danah Boyd. Beagrie's Venn diagram showing the definition of "public persona" begs comparison to Boyd's thesis work in faceted identity. I imagine that there is much to explore about the intersection of faceted identities or, for that matter, multiple personal public persona's, and the consequences to the "Lifetime Personal Web-spaces" concept mentioned at the end of the article.

In closing, one quote in particular caught my attention as it factors into my explorations into the "save everything" debate. Beagrie says (which he credits to Michael Lesk): "The combination of cheap digital storage and very sophisticated retrieval tools is shifting the balance of costs: digitally it is becoming cheaper to collect and more expensive to select, and cheaper to search than to organize." In other words, the scarcity argument is shifting from "we don't have enough space" to "we don't have the time to organize what we have," but as Beagrie seems to say, it no longer matters so long as you do not expect traditional access mechanisms. Or, more succinctly (with a nod to Catherine Stollar for originally expressing it): "what we do... will change, but why we do it does not."