Introduction to Metadata: Introduction

Like metadata itself, the realm of online resources is constantly and rapidly growing and evolving. Much has changed in the digital information landscape since the first print edition of this book was published in 1998, followed by revised editions in 2000 and 2008. The time is right for an updated edition of this text, intended to give a general introduction to metadata and to explain some of the key tools, concepts, and issues associated with using metadata to build authoritative, reliable, and useful digital resources. In the last few years, phenomena such as linked open data have begun to play an important role in the Semantic Web; the standard for library cataloging, Anglo-American Cataloging Rules, has been largely replaced by the Resource Description and Access (RDA) standard; and the BIBFRAME linked open data standard is poised to become the successor to the venerable MARC format for encoding bibliographic metadata.

Metadata creation is—or should usually be—a collaborative effort, as is this publication. Anne Gilliland, the late Mary Woodley, and Maureen Whalen updated their chapters, and with the help of several colleagues, I updated Tony Gill’s chapter on metadata and the web. The fact that this publication is the result of several people working together is significant—and indicative of how we work today.

In the first chapter Anne Gilliland provides an overview of metadata—its types, roles, and characteristics—as well as facts about metadata that belie several common misconceptions. She also addresses recent trends in metadata creation, particularly that of metadata created by users rather than by trained information professionals. Activities such as social tagging, social bookmarking, and the resulting forms of user-created metadata such as “folksonomies” are playing an increasingly important role in the realm of digital information.

Chapter 2 focuses on metadata as it relates to resources on the web. We explain how web search engines work and how they use metadata, data, links, and relevance ranking to help users find what they are seeking. We also discuss in detail the commercial search engine that, as of this writing, has dominated the web for several years: Google. A key concept in this chapter is the difference between the visible web and the hidden web and the important implications and issues related to making resources reachable from commercial, publicly available search engines versus systems that have one or more “barriers” to access—either because they are fee based, password protected, or require a particular IP address, or simply because they are not technically exposed to commercial search engines. How library metadata behaves in the era of Google dominance is also addressed.

In the third chapter, Mary Woodley examines the methods, tools, standards, and protocols that can be used to publish and disseminate digital collections in a variety of online venues. She shows how “seamless searching”—integrated access to a variety of resources residing in different information systems and formulated according to a range of standard and nonstandard metadata schemes—is still far from a reality. Woodley contrasts the method of “federation” by means of building union catalogs of digital collections by aggregating metadata records from diverse contributors into a single database with “metasearching”—real-time searching of diverse resources that have not been aggregated but rather are searched in situ by means of one or more protocols. Each method requires specific skills and knowledge; particular procedures, protocols, and data standards; and the appropriate technical infrastructure. Creating union resources via physical aggregation of metadata records or via metadata harvesting is a good thing, but we should keep in mind that it does not necessarily solve the hidden web problem enunciated in chapter two. If resources are publicly available but users cannot reach them from Google and instead have to find the specific search page for a particular union resource, we cannot say that we have provided unfettered access to that resource. Woodley also stresses the importance of data value standards—controlled vocabularies, thesauri, lists of terms and names, and folksonomies—for enhancing end-user access. She points out that mapping metadata elements alone is not sufficient to connect all users with what they seek; the data values—that is, the vocabularies used to populate those metadata elements—should also be mapped.

Maureen Whalen’s chapter, “Rights Metadata Made Simple,” argues that the research and capture of standards-based rights metadata should be essential activities of memory institutions and offers practical, realistic options for determining and recording core rights metadata. If institutions would commit the effort and resources to following Whalen’s advice, many of the obstacles to unfettered end-user access could be surmounted.

In the section on “Practical Principles for Metadata Creation and Maintenance,” we emphasize that institutions need to change old paradigms and procedures. Libraries, archives, museums, and other memory organizations need to make a lasting commitment to creating and continually updating the various types of core metadata relating to their collections and the digital surrogates of collection materials that we all seem to be in such a hurry to create and make available online.

Our updated glossary is not intended to be comprehensive; rather, its purpose is to explain the key concepts and tools discussed in this publication. The footnotes in each of the chapters provide additional references to publications and online resources relevant to the topic of metadata and digital libraries.

At the end of her chapter, Anne Gilliland compares metadata to an investment that, if wisely managed, can deliver a significant return on intellectual capital. I would venture to expand on her financial metaphor and say that metadata is one of our most important assets. Hardware and software come and go—sometimes becoming obsolete with alarming rapidity—but high-quality, standards-based, system-independent metadata can be used, reused, migrated, and disseminated in any number of ways, even in ways that we cannot anticipate at this moment (as in the case of linked open data, which is a relatively recent concept).

Digitization does not equal access. The mere act of creating digital copies of collection materials does not make those materials findable, understandable, or utilizable to our ever-expanding audience of online users. But digitization combined with the creation of carefully crafted metadata can significantly enhance end-user access—and our users are the primary reason we create digital resources.

In closing, I would like to dedicate this publication to my friend and colleague Mary Woodley, a consummate librarian and metadata expert. Mary’s revised chapter, which she completed during what would be the last months of her life, is a testament to her deep knowledge of metadata and controlled vocabularies, her love of libraries, and her vocation to connect users to the information they seek.