An OAIS Ingest Metadata Specification

Problem Definition

For this exercise, we will prepare a digital object for submission to a digital archive for long term preservation. The digital object in question is an HTML text with an in-line image and links to several other HTML texts. The objects must be readable, but the specific look and feel of the rendered text is not important. We are to generate a metadata set that will conform to ingest requirements for an Open Archival Information System (OAIS) according to a Submission Information Package (SIP) agreement. Additionally, we will consider the process for converting a SIP into an Archival Information Package (AIP) and extend the metadata set with additional elements for the conversion process, as needed.

OAIS Ingest and SIP

The OAIS model outlines a system-agnostic framework to ensure reliable, long-term preservation of information. The process begins with the submission of an information package from a producer of information (publisher, author, researcher, etc.) to the archive. The information is submitted as one or more SIP objects that conform to the archive's SIP agreement. For a digital archive, the information is sent electronically along with descriptive metadata. The components of an OAIS information package are shown in Figure 1 (CCSDS, 2002, §4).



Figure 1: OAIS information package (CCSDS, 2002, p. 4-31)


As shown above, the information package is comprised of Content Information (CI) and Preservation Description Information (PDI). In the current exercise, the content information consists of the HTML code and image file, or pointers to these resources, and the representation information necessary to decode the content of the digital objects. The Packaging Information binds the CI and PDI by some means, including universal identifiers, encoding (such as used for a CD-ROM), or some other system-specific method.

Descriptive information is derived from the PDI or declared prior to submission. PDI contains the metadata needed for conversion into an AIP and further storage within and access from the repository. Four functional areas are defined in the PDI as summarized in Table 1


Provenance

Content

Reference

Fixity

Source

  • Description and pointer to original object(s)

  • Copyright/legal restrictions

  • Access restrictions

  • Authority to modify representation information or migrate

  • Agreements with external organizations

History

  • Change history

  • Pointers to other versions

  • Custody since origination

Relationship to other objects

  • Description

  • Pointers to other objects

Purpose/Reason for creation

  • Reason for creation

  • Reason for archiving

Encoding environment

  • Software

  • Languages

  • Character set(s)

Unique identifier(s)

  • URI, ID Number, etc.

Bibliographic description

  • Creator(s), organization(s)

  • Date of creation

  • Title(s)

  • Etc.

Authenticity indicators

  • Checksum, CRC, MD5 hash

  • Digital signatures

  • Encryption

Quality of service requirements

  • Specification of integrity preserving mechanisms

  • Error protection specifications

Table 1: Elements of PDI (CCSDS, 2002, pp. 4-27 – 4-29)

A Model SIP

When approaching the problem of metadata creation, two basic approaches may be used. First, we may generate a new metadata set that conforms to the archives requirements. Generating a new set allows the specific project requirements to be addressed in detail, but requires a significant amount of labor to produce the DTD or schema and the tools to use them. Furthermore, a new metadata set does not leverage existing standards and practices for interchange. The second, and preferred, approach is to use or extend an existing metadata specification.

A review of existing metadata specifications reveals that the Metadata Encoding and Transmission Standard (METS) provides a framework for many of the necessary elements for a SIP (Library of Congress, 2004). The first two columns of Table 2 show a basic mapping of METS container elements to OAIS ingest information. Since METS does not provide detailed descriptive metadata elements, other metadata schemes must be used to complete the SIP. Source metadata schemes and sets for detailed metadata are shown in the third column of Table 2. External metadata sets can be linked from the METS container using <mdRef>, or embedded within using one of the metadata types allowed in <mdWrap>.


OAIS Ingest Information

METS Element Location(s)

Source for Specific Metadata

Preservation Description Information

all elements nested under <mets> root

Provenance

Description of original object(s)

<amdSec>
   <sourceMD>

Metadata Object Description Schema:

<relatedItem type=”original”>

Rights management information

<amdSec>
   <rightsMD>

Rights Declaration Extension Schema:

<RightsDeclarationMD>

Access restrictions

<amdSec>
   <rightsMD>

Rights Declaration Extension Schema:

<RightsDeclarationMD>
   <Context>

Agreements with creator(s)

<amdSec>
   <rightsMD>

Rights Declaration Extension Schema:

<RightsDeclarationMD>
   <Context>

Agreements with external organizations

<amdSec>
   <rightsMD>

Rights Declaration Extension Schema:

<RightsDeclarationMD>
   <Context>

Change history

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<originInfo>
   <dateModified>

Pointers to other version(s)

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<relatedItem type=”otherVersion”>

Custody since origination

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<note type=”bibliographic history”>

Content

Relationship to related objects

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<relatedItem type=”XXX”>

Pointers to related objects

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<relatedItem xlink=”XXX”>
   <location>

Reason for creation

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<note type=”bibliographic history”>

Reason for archiving

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<note type=”conservation history”>

Original encoding/technical environment

<amdSec>
   <techMD>

Schema for Technical Metadata for Text:

<byte_order>, <charset>, <encoding>, <markup_language>

Reference

Unique identifier(s)

<dmdSec>

Metadata Object Description Schema:

<identifier>

Bibliographic description

<dmdSec>

Metadata Object Description Schema:

<titleInfo>, <name>, <subject>, <language>, <typeOfResource>, <genre>, <abstract>, <originInfo>, <physicalDescription>

Fixity

Authenticity indicators

<fileSec>
   <fileGrp>
      <file CHECKSUM=”XXX”>

N/A

Quality of service requirements

<amdSec>
   <techMD>

Schema for Technical Metadata for Text:

<viewingRequirements>

Content Information

all elements nested under <mets> root

Data object(s)

<fileSec>
   <fileGrp>
      <file>
         <FLocat>
         <FContent>
            <binData> | <xmlData>

N/A

Packaging Information

all elements nested under <mets> root

Relationships between data objects

<structMap>
   <div>
      <mptr> | <fptr>
<structLink>

N/A

Table 2: Basic OAIS ingest and METS container mapping (derived from: Tingle, 2004; Library of Congress, 2003; Library of Congress, 2004b)

Augmenting the SIP to Facilitate Conversion to an AIP

Figure 2 illustrates the complete OAIS ingest process. Once all of the component SIPs have been received and checked for conformance to the archive's ingest specifications, the SIPs are enhanced and reformatted according to the archive's technical standards, as necessary. Workflow procedures may require an administrative audit of the submitted information package. Once these processes are complete, the AIP is created, descriptive metadata and “browse products” are extracted, then the AIP is sent to data management.



Figure 2: OAIS ingest process (CCSDS, 2002, p. 4-5)



In the process of converting SIPs to an AIP, new metadata is created. To ease the conversion process, the SIP format should include element definitions for these items of information. Additional metadata for the SIP to AIP conversion are shown in Table 3 along with the METS container location and prescribed metadata element source.


Additional Information

METS Element Location(s)

Source for Specific Metadata

Ingest history

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<note type=”acquisition”>
<note type=”conservation history”>

Quality assurance (QA) results

<amdSec>
   <techMD>

Schema for Technical Metadata for Text <processingNote>

History of formatting and encoding changes made during conversion process

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<note type=”conservation history”>

Administrative audit reports

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<note type=”admin”>

Additional descriptive or classification data

<dmdSec>

Metadata Object Description Schema:

<classification>, <subject>

Relationship to archive objects or collections

<amdSec>
   <digiprovMD>

Metadata Object Description Schema:

<relatedItem>

Internal identifier(s)

<dmdSec>

Metadata Object Description Schema:

<identifier>

Table 3: Additional metadata required for SIP to AIP conversion (derived from: Tingle, 2004; Library of Congress, 2003; Library of Congress, 2004b).

Example SIP Profile

The appendix contains a METS Profile Schema document describing the requirements enumerated in Tables 2 and 3. At the end of the profile is an <Appendix> element containing an example ingest package encoded according to the profile specification.

References

Consultative Committee for Space Data Systems (CCSDS) (2002). Reference model for an Open Archival Information System (OAIS). CCSDS 650.0-B-1 BLUE BOOK. Retrieved on 28 November, 2004, from http://ssdoo.gsfc.nasa.gov/nost/wwwclassic/documents/pdf/CCSDS-650.0-B-1.pdf.

Library of Congress (2003). METS news and announcements: Draft rights declaration schema is ready for review. Retrieved on 11 December, 2004, from http://www.loc.gov/standards/mets/news080503.html.

Library of Congress (2004). METS: An overview & tutorial. Retrieved on 7 December, 2004, from http://www.loc.gov/standards/mets/METSOverview.v2.html.

Library of Congress (2004). Metadata Object Description Schema (MODS). Retrieved on 11 December, 2004, from http://www.loc.gov/standards/mods.

McDonough, J. (2003). Schema for Technical Metadata for Text. Retrieved on 11 December, 2004, from http://dlib.nyu.edu/METS/textmd.html.

Tingle, B. (2004). METS 1.3 schema documentation. Retrieved on 9 December, 2004, from http://ark.cdlib.org/mets/schema_documentation.

AttachmentSize
mets_sip_example.xml18.98 KB