The Getty Previous
Home
Introduction
Setting the Stage
Metadata and the World Wide Web
Crosswalks: The Path to Universal Access?
Crosswalks
Glossary
Acronyms & URLs
Contributors
Printer Friendly PDFs



Introduction to Metadata


Setting the Stage

By Anne J. Gilliland

Introduction

Metadata, literally "data about data," is an increasingly ubiquitous term that is understood in different ways by the diverse professional communities that design, create, describe, preserve, and use information systems and resources. As these communities - and also repositories and information and communication technologies - come together to make the information age a reality, it is essential that we understand the critical roles that different types of metadata can play in the development of effective, authoritative, interoperable, scaleable, and preservable cultural heritage information and recordkeeping systems. Until the mid-1990s, "metadata" was a term most prevalently used by communities involved with the management and interoperability of geospatial data, and with data management and systems design and maintenance in general. For these communities, "metadata" referred to a suite of industry or disciplinary standards as well as additional internal and external documentation and other data necessary for the identification, representation, interoperability, technical management, performance, and use of data contained in an information system.

Perhaps a more useful "big picture" way of thinking about metadata is as "the sum total of what one can say about any information object1 at any level of aggregation." In this context, an information object is anything that can be addressed and manipulated by a human or a system as a discrete entity. The object may be comprised of a single item, or it may be an aggregate of many items. In general all information objects, regardless of the physical or intellectual form they take, have three features - content, context, and structure - all of which can be reflected through metadata.

  • Content relates to what the object contains or is about, and is intrinsic to an information object.
  • Context indicates the who, what, why, where, how aspects associated with the object's creation and is extrinsic to an information object.
  • Structure relates to the formal set of associations within or among individual information objects and can be intrinsic or extrinsic.

Cultural heritage and information professionals such as museum registrars, library catalogers, and archival processors are increasingly applying the term metadata to the value-added information that they create to arrange, describe, track and otherwise enhance access to information objects. Library metadata development has been first and foremost about providing intellectual and physical access to content. Library metadata includes indexes, abstracts, and catalog records created according to cataloging rules and structural and content standards such as MARC (MAchine-Readable Cataloging format), as well as authority forms such as LCSH (Library of Congress Subject Headings) or the AAT (Art & Architecture Thesaurus). Such bibliographic metadata has been cooperatively created since the 1960s and made available to repositories and users through automated systems such as bibliographic utilities, online public access catalogs (OPACs), and commercial online databases.

A large component of archival and museum activities has traditionally been focussed on context. Elucidating and preserving context is what assists with identifying and preserving the evidential value of records and artifacts in and over time; it is what facilitates the authentication of those objects, and it is what assists researchers with their analysis and interpretation. Archival and manuscript metadata includes accession records, finding aids, and catalog records. Archival descriptive standards that have been developed in the past two decades include the MARC Archival and Manuscript Control (AMC) format published by the Library of Congress in 1984 (now integrated into the MARC format for bibliographic description); the General International Standard Archival Description (ISAD (G)) published by the International Council on Archives in 1994; and Encoded Archival Description EAD), adopted as a standard by the Society of American Archivists (SAA) in 1999. While archival metadata has primarily existed in print form until recently, it is increasingly distributed on line through resources such as the RLIN AMC (the Research Libraries Information Network Archival and Mixed Collections file,2 Archives USA,3 and EAD-based archival information systems such as the Online Archive of California.4 Consensus and collaboration have been slower to build in the museum community, where the benefits of standardization of description such as shared cataloging and exchange of descriptive data are less readily apparent.

An emphasis on the structure of information objects in metadata development by these communities has perhaps been less overt. However, structure has always been important in information organization and representation, even prior to computerization. Documentary and publication forms have evolved into industry standards and societal norms and have become an almost transparent information management tool - when users access a birth certificate, they can predict what the content is likely to be. When they use a scholarly monograph, they intuitively understand that it will be organized with a table of contents, chapter headings, and an index. Archivists use the physical structure of their finding aids to provide visual cues to researchers about the structural relationships between different parts of a record series or manuscript collection. Archival description also very much exploits the hierarchical arrangement of records according to the bureaucratic hierarchies and business practices of the creators of those records.

The role of structure has been growing as computer processing capabilities become increasingly powerful and sophisticated. Information communities are aware that the more highly structured an information object is, the more that structure can be exploited for searching, manipulation, and interrelating with other information objects. Capturing, documenting, and enforcing that structure, however, require specific types of metadata.

In short, in an environment where a user can gain unmediated access to information objects over a network, metadata:

  • certifies the authenticity and degree of completeness of the content;
  • establishes and documents the context of the content;
  • identifies and exploits the structural relationships that exist between and within information objects;
  • provides a range of intellectual access points for an increasingly diverse range of users; and
  • provides some of the information an information professional might have provided in a physical reference or research setting.

There is more to metadata than description, however. A more inclusive conceptualization of metadata is needed as information professionals consider the range of their activities that may end up being incorporated into digital information systems. Repositories also create metadata relating to the administration, accessioning, preservation, and use of collections. Acquisition records, exhibition catalogs, and use data are all examples of these, even though they are still largely created in paper form. Today, integrated information systems such as virtual museums, digital libraries, and archival information systems include digital versions of actual collection content as well as descriptions of that content. Incorporating other types of metadata into such systems reaffirms their importance in administering collections and maintaining their intellectual integrity both in and over time. Paul Conway alludes to this metadata capability when he discusses the impact of digitization on preservation:

The digital world transforms traditional preservation concepts from protecting the physical integrity of the object to specifying the creation and maintenance of the object whose intellectual integrity is its primary characteristic.5

When applied outside the repository, the term metadata acquires an even broader scope. An Internet resource provider might use metadata to refer to information being encoded into HTML metatags for the purposes of making a Web site easier to find. Individuals digitizing images might think of metadata as the information they enter into the header field for the digital file to record information about the image, the imaging process, and image rights. A social science data archivist might use the term to refer to the systems and research documentation necessary to run and interpret a magnetic tape containing raw research data. An electronic records archivist might use the term to refer to all the contextual, processing, and use information needed to identify and document the scope, authenticity, and integrity of an active or archival record in an electronic recordkeeping system. Metadata is critical in personal information management and for ensuring effective information retrieval and accountability in recordkeeping - something that is becoming increasingly important with the rise of electronic commerce and digital government. In all of these diverse interpretations, metadata not only identifies and describes an information object; it also documents how that object behaves, its function and use, its relationship to other information objects, and how it should be managed.

As this discussion implies, theory and practices vary considerably owing to the differing professional and cultural missions of museums, archives, libraries, and other information and recordkeeping communities.6 Many additional highly detailed metadata standards are now emerging (such as EAD and the Australian Recordkeeping Metadata Schema (RKMS)) that attempt to articulate these communities' mission-specific differences as well as to facilitate mapping between common data elements. By contrast, the Dublin Core Metadata Element Set (DC) identifies a small, simple set of metadata elements that can be used by any community to describe and search across a wide variety of information resources on the World Wide Web. Such metadata standards are necessary in order to ensure that different kinds of descriptive metadata are able to interoperate with each other and with metadata from non-bibliographic systems of the kind that the data management communities and information creators are generating. Indeed, not since the latter part of the nineteenth century has there been such an exciting - but also potentially bewildering - array of organizational and descriptive schemas from which information professionals can choose.

Categorizing Metadata

All of these perspectives on metadata become important in the development of networked digital information systems, but they lead to a very broad conception of metadata. To understand this conception better, it is helpful to break it down into distinct categories - administrative, descriptive, preservation, use, and technical metadata - that reflect key aspects of metadata functionality. Table 1 defines each of these metadata categories and gives examples of common functions that each might perform in a digital information system.

Table 1. Different Types of Metadata and Their Functions

Type

Definition

Examples

Administrative7

Metadata used in managing and administering information resources

- Acquisition information

- Rights and reproduction tracking

- Documentation of legal access requirements

- Location information

- Selection criteria for digitization

- Version control and differentiation between similar information objects

- Audit trails created by recordkeeping systems

Descriptive

Metadata used to describe or identify information resources

- Cataloging records

- Finding aids

- Specialized indexes

- Hyperlinked relationships between resources

- Annotations by users

- Metadata for recordkeeping systems generated by records creators

Preservation

Metadata related to the preservation management of information resources

- Documentation of physical condition of resources

- Documentation of actions taken to preserve physical and digital versions of resources, e.g., data refreshing and migration

Technical

Metadata related to how a system functions or metadata behave

- Hardware and software documentation

- Digitization information, e.g., formats, compression ratios, scaling routines

- Tracking of system response times

- Authentication and security data, e.g., encryption keys, passwords

Use

Metadata related to the level and type of use of information resources

- Exhibit records

- Use and user tracking

- Content re-use and multi-versioning information

In addition to there being different types of metadata and metadata functions, metadata also exhibits many different characteristics. Table 2 indicates some of the key attributes of metadata, with examples.

Table 2. Attributes and Characteristics of Metadata

Attribute

Characteristics

Examples

Source of metadata

Internal metadata generated by the creating agent for an information object at the time when it is first created or digitized

- File names and header information

- Directory structures

- File format and compression scheme

External metadata relating to an information object that is created later, often by someone other than the original creator

- Registrarial and cataloging records

- Rights and other legal information

Method of metadata creation

Automatic metadata generated by a computer

- Keyword indexes

- User transaction logs

Manual metadata created by humans

- Descriptive surrogates such as catalog records and Dublin Core metadata

Nature of metadata

Lay metadata created by persons who are neither subject nor information specialists, often the original creator of the information object

- Metatags created for a personal Web page

- Personal filing systems

Expert metadata created by either subject or information specialists, often not the original creator of the information object

- Specialized subject headings

- MARC records

- Archival finding aids

Status

Static metadata that never change once they have been created

- Title, provenance, and date of creation of an information resource

Dynamic metadata that may change with use or manipulation of an information object

- Directory structure

- User transaction logs

- Image resolution

Long-term metadata necessary to ensure that the information object continues to be accessible and usable

- Technical format and processing information

- Rights information

- Preservation management documentation

Short-term metadata, mainly of a transactional nature

Structure

Structured metadata that conform to a predictable standardized or unstandardized structure

- MARC

- TEI and EAD

- local database formats

Unstructured metadata that do not conform to a predictable structure

- Unstructured note fields and annotations

Semantics

Controlled metadata that conform to a standardized vocabulary or authority form

- AAT

- ULAN

- AACR2

Uncontrolled metadata that do not conform to any standardized vocabulary or authority form

- Free-text notes

- HTML metatags

Level

Collection metadata relating to collections of information objects

- Collection-level record, e.g., MARC record or finding aid

- Specialized index

Item metadata relating to individual information objects, often contained within collections

- Transcribed image captions and dates

- Format information

Metadata creation and management have become a very complex mix of manual and automatic processes and layers created by many different functions and individuals at different points in the life of an information object. Figure 1 illustrates the different phases through which information objects typically move during their life in a digital environment.8 As they move through each phase, the objects acquire layers of metadata that can be associated with the objects in several ways. This metadata can be contained within the same envelope as the information object - for example, in the form of header information for an image file, e.g. Dublin Core, or through some form of bundling, for example, with the Universal Preservation Format (UPF). Metadata can also be attached to the information object through bi-directional pointers or hyperlinks, while the relationships between metadata and information objects, and between different aspects of metadata, can be documented by registering them with a metadata registry.9 However, in any instance where it is critical that metadata and content coexist, then it is recommended that the metadata become an integral part of the information object and not be stored elsewhere.

As systems designers increasingly respond to the need to incorporate and manage metadata in information systems and to address how to move them forward through time, many additional mechanisms for associating metadata with information objects are likely to become available. Metadata registries and schema recordkeeping systems are also more likely to develop as it becomes increasingly necessary to document schema evolution and to alert implementors to version changes.

Figure 1. The Life Cycle of Objects Contained in a Digital Information System

The Life Cycle of Objects Contained in a Digital Information System

Creation and multi-versioning: Objects enter a digital information system by being created digitally or by being converted into digital format. Multiple versions of the same object may be created for preservation, research, dissemination, or even product development purposes. Some administrative and descriptive metadata may be included by the creator.

Organization: Objects are automatically or manually organized into the structure of the digital information system and additional metadata for those objects may be created through registration, cataloging, and indexing processes.

Searching and retrieval: Stored and distributed objects are subject to search and retrieval by users. The computer system creates metadata that track retrieval algorithms, user transactions, and system effectiveness in storage and retrieval.

Utilization: Retrieved objects are utilized, reproduced, and modified. Metadata related to user annotations, rights tracking, and version control may be created.

Preservation and disposition: Information objects undergo processes such as refreshing, migration, and integrity checking to ensure their continued availability. Information objects that are inactive or no longer necessary may be discarded. Metadata may document both preservation and disposition activities.

Other Little-Known Facts about Metadata

  1. Metadata does not have to be digital. Cultural heritage and information professionals have been creating metadata for as long as they have been managing collections. Increasingly, such metadata are being incorporated into digital information systems.
  2. Metadata relates to more than the description of an object. While museum, archives, and library professionals may be most familiar with the term in association with description or cataloging, metadata can also indicate the context, management, processing, preservation and use of the resources being described.
  3. Metadata can come from a variety of sources. It can be supplied by a human (a creator, information professional, or user), created automatically by a computer, or inferred through a relationship to another resource such as a hyperlink.
  4. Metadata continue to accrue during the life of an information object or system. Metadata is created, modified, and sometimes even disposed of at many points during the life of a resource.
  5. One information object's metadata can simultaneously be another information object's data.

Why is metadata important?

As illustrated by the preceding discussion, metadata consists of complex constructs that can be expensive to create and maintain. How then can one justify the costs and efforts involved? The development of the World Wide Web and other networked digital information systems has provided information professionals with many opportunities, while at the same time requiring them to confront issues that they have not had occasion to explore previously. Judiciously crafted metadata element sets, wherever possible conforming to national and international standards, have become the tools that information professionals are using to exploit some of these opportunities, as well as to address some of the new issues:

Increased accessibility: Effectiveness of searching can be significantly enhanced through the existence of rich, consistent metadata. Metadata can also make it possible to search across multiple collections or to create virtual collections from materials that are distributed across several repositories, but only if the descriptive metadata are the same or can be mapped across each site. Digital information systems and emerging metadata standards developed by different professional communities but incorporating some common data elements, such as Encoded Archival Description (EAD), the Text Encoding Initiative (TEI), and the Dublin Core are making it easier for users to negotiate between descriptive surrogates of information objects and digital versions of the objects themselves, and to search at both the item and collection level within and across information systems.10

Retention of context: Museum, archival, and library repositories do not simply hold objects. They maintain collections of objects that have complex interrelationships among each other and associations with people, places, movements, and events. In the digital world it is not difficult for a single object from a collection to be digitized and then to become separated from both its own cataloging information and its relationship to the other objects in the same collection. Metadata plays a critical role in documenting and maintaining those relationships, as well as in indicating the authenticity, structural and procedural integrity, and degree of completeness of information objects. In an archive, for example, by documenting the content, context, and structure of an archival record, metadata in the form of an archival finding aid is what helps to distinguish that record from decontextualized information.

Expanding use: Digital information systems for museum and archival collections make it easier to disseminate digital versions of unique objects to users around the globe who, for reasons of geography, economics, or other barriers, might otherwise never have had an opportunity to view them. With new communities of users, however, come new challenges concerning how to make the materials most intellectually accessible to them. These new communities of users may have significantly different needs to those of the traditional users for whom many existing information services have been designed. For example, teachers and school children may want to search for and use information objects in quite different ways than scholarly researchers do. Metadata can document changing uses of systems and content, and that information can in turn feed back into systems development decisions. Well-structured metadata can also facilitate an almost infinite number of ways to search for information, present results, and even manipulate information objects without compromising the integrity of those information objects.

Multi-versioning: The existence of information and cultural objects in digital form has heightened interest in the ability to create multiple and variant versions of those objects. This process may be as simple as creating both a high-resolution copy for preservation or scholarly research purposes and a low-resolution thumbnail image that can be rapidly transferred over a network for quick reference purposes. Or it may involve creating variant or derivative forms to be used, for example, in publications, exhibitions, or schoolrooms. In either case, there must be metadata to link the multiple versions and capture what is the same and what is different about each version. The metadata must also be able to distinguish what is qualitatively different between variant digitized versions and the hard copy original or parent object.

Legal issues: Metadata allows repositories to track the many layers of rights and reproduction information that exist for information objects and their multiple versions. Metadata also documents other legal or donor requirements that have been imposed on objects - for example, privacy concerns or proprietary interests.

Preservation: If digital information objects that are currently being created are to have a chance of surviving migrations through successive generations of computer hardware and software, or removal to entirely new delivery systems, they will need to have metadata that enables them to exist independently of the system that is currently being used to store and retrieve them. Technical, descriptive, and preservation metadata that documents how a digital information object was created and maintained, how it behaves, and how it relates to other information objects will all be essential. It should be noted that for the information objects to remain accessible and intelligible over time, it will also be essential to preserve and migrate this metadata.

System improvement and economics: Benchmark technical data, much of which can be collected automatically by a computer, is necessary to evaluate and refine systems in order to make them more effective and efficient from a technical and economic standpoint. The data can also be used in planning for new systems.

Conclusion and Outstanding Questions

Metadata is like interest - it accrues over time. To stretch the metaphor further, wise investments generate the best return on intellectual capital. Carefully designed metadata results in the best information management in the short and long-term. If thorough, consistent metadata has been created, it is possible to conceive of it being used in an almost infinite number of new ways to meet the needs of non-traditional users, for multi-versioning, and for data mining. But the resources and intellectual and technical design issues involved in metadata development and management are far from trivial. For example, some key questions that must be resolved by information professionals as they develop digital information systems and objects include:

  • identifying which metadata schema or schemas should be applied in order to best meet the needs of the information creator, repository and users;
  • deciding which aspects of metadata are essential for what they wish to achieve, and how granular they need each type of metadata to be - in other words, how much is enough and how much is too much. There will likely always be important tradeoffs between the costs of developing and managing metadata to meet current needs, and creating sufficient metadata that can be capitalized upon for future, often unanticipated uses;
  • ensuring that the metadata schemas being applied are the most current versions.

What we do know is that the existence of many types of metadata will prove critical to the continued physical and intellectual accessibility and utility of digital information resources and the information objects that they contain. In this sense, metadata provides us with the Rosetta Stone that will make it possible to decode information objects and their transformation into knowledge in the cultural heritage information systems of the twenty-first century.










 
     

The J. Paul Getty Trust
The J. Paul Getty Trust
© J. Paul Getty Trust | Privacy Policy | Terms of Use