Audio Encoding Project: On Genre Description

First, a status update on the project. At this point, I have lost track of exactly how many discs I have encoded. This is probably because the ripping environment has been working virtually flawlessly since I finished troubleshooting, but, a rough estimate puts me at around 200-250 discs encoded. Now, to move on to an issue that has been in the back of my mind for a while: genre description.

Specifically, I am finding myself increasingly annoyed by the lack of depth in genre description allowed in ID3-type metadata. To expound: most digital audio formats support some sort of embedded metadata, one of the most common being the ID3 tag block used by MP3s and FLAC. The ID3 specification allows for a single field to describe the genre of the object. Since the ID3 tag is embedded within each unitary object, this allows for record-level description. Unfortunately, I have been encoding a disc at a time and the software I am using only allows descriptive metadata to be defined at the disc level, which is then copied into each file upon encoding. This is fine for fields such as album name, release year, or artist, but is quite frustrating for its tendency to stereotype artists or releases as a whole. To make things worse, the centralized database of CD information, FreeDB, limits the genre field to a choice of among only 11 -- and not a single one of them represents any type of electronic music. For a collection such as mine that is dominated by electronic and abstract styles, this limitation is unacceptable. Fortunately, I can override the FreeDB defaults for purposes of encoding.

Determining the most appropriate descriptive term for the genre of an object is a problem that is not at all new to descriptive cataloging. Any object can have different semantic uses and/or meanings depending upon the attitude and understanding of the describer and the user. Furthermore, using a pre-coordinate description precludes the notion that new understandings or uses for the content (hence, new genre descriptions) could become apparent at a later time. As a result, I have been selecting the most specific possible genre term that can help identify the musical genre within a fairly broad tolerance, avoiding overly obscure or transient terms. For some works, the best term is obvious, but compendiums of music with very little in common among songs or artists or eclectic compositions confound any attempt at detailed description without record-level control.

So, a combination of technological limitations and the theoretical limitations of description have conspired to limit the genre choices I may make while encoding. I can overcome the record level constraint by going back through my encoded collection with an ID3 tagger, but I am still limited to one, single term. This may suffice on some general level, but is highly unsatisfactory to me personally, and this is why…

Envision, if you will, a media player or other system that provides access to my corpus of encoded music. The system in question could access by artist, title or year with high recall. But, imagine that I want to use this system to generate playlists on-the-fly based on various content semantics, the content being the music itself. The single-term genre field will yield high recall for many types of music, but recall suffers for types of music that cross genres or are equally applicable to several at a time. For example, much of my music could be termed “ambient,” implying slow to no beats or rhythm and a generally softer or quieter composition. Some ambient tracks, however, lend themselves well to a more traditionally industrial genres, or downtempo/chill, or experimental – all of which are genres that can stand in their own right apart from ambient. If we were to visualize this graphically, imagine a Venn diagram with all of these genres overlapping with ambient (and in some cases, a bit with each other). A single term is unable to capture this depth, thus, the recall of automatically generated playlists is limited.

Additionally, the genre description does little to capture two other semantic aspects of song content: tempo and (what I will refer to as) energy. Any song, no matter what genre, can be classified according to its rhythmic speed with very little disagreement among users. This additional level of description would enhance playlist generation by preventing the sudden acceleration or deceleration between tracks that is so prevalent among streaming Internet radio stations. A smooth, consistent feel can be projected across a whole playlist of between groups of songs, or algorithms could be devised to create a change in tempo across the playlist in a myriad of creative ways. One may also see a similar role for key and time signature – all three of these could be determined automatically with great accuracy during the encoding process.

Energy is a bit less definitive. What I mean by energy is a description that takes into account the emotional states that may be experienced by the listener. There is an inherent bias that is transferred by the describer, but I feel that, like the genre description, energy could be described consistently enough to be used in an advisory capacity. Genre has been used to encompass energy to some degree. For example, I have seen CDDB descriptions such as “dark techno” or “ambient industrial” that do the work of describing both the energy and technical style. Unfortunately, the result is that the genre term is devalued as ever more granular descriptions that can become lost over time as collective definitions of genre morph and change. As with tempo, energy can be used to prevent abrupt transitions in automatically generated playlists. Unlike tempo, which uses a linear scale, energy is much less definite and will require a thesaurus to determine appropriate transitions and relationships.

Regardless of the depth of description that is possible, it is clear that a single descriptive genre term is not sufficient. A simple modification to the ID3 specification could allow multiple genre terms to be stored at the record level, thus improving recall for access systems.