Metadata Quality Control

As I near the end of the first phase of my audio encoding project I feel the need to share some of the metadata quality control observations that I have collected.

Although ripping my CDs to digital media has been time consuming, it has not been near as laborious as checking and correcting the metadata that was automatically gathered during the process. FreeDB as an automatic metadata gathering service has been very helpful, but as I reviewed the corpus of encoded audio, I found many disturbing errors: misspellings, typos, missing articles, missing fields omission or misrepresentation of international characters, and, of course, the usual discrepancies in case handling, title formating, and normalized forms..

What's in a Creation Date?

There is a certain perception that often accompanies digital objects and, more broadly, computer systems as a whole. This sort of perception manifests itself when, for example, we hear about how massively compressed digital MP3 files are considered to be "perfect" quality audio or in similar myths concerning the infallibility of all things digital. These perceptions are based on incomplete or inaccurate assumptions about how software, operating systems, or file systems function. My favorite way of stating this is that computers are only as smart as those who designed them – if to err is human, then the same goes for our electronic creations.

When making the transition from paper to digital records, these assumptions are likely to appear in unexpected places. While working on the Joyce collection, we ran headlong into one of these assumptions, made a note of it, then moved on. But I promised that I would look closer at the issue at a later time... so here I go.

Audio Encoding Project: On Genre Description

First, a status update on the project. At this point, I have lost track of exactly how many discs I have encoded. This is probably because the ripping environment has been working virtually flawlessly since I finished troubleshooting, but, a rough estimate puts me at around 200-250 discs encoded. Now, to move on to an issue that has been in the back of my mind for a while: genre description.

Ruminations on Generating Project Metadata

Although I am still debugging the CD ripping problems I have been having, I have enough of a music corpus to begin thinking about second stage metadata generation. Additionally, I already have a corpus of 1700+ digital photos that I can also begin thinking about describing.