Vocabularies as knowledge bases
Vocabularies: Types and formats
The role of authority work
Vocabulary building
Vocabularies as knowledge bases
A vocabulary is a body of knowledge represented by language.
It answers the question - "How do we talk (or write)
about this particular subject area?" Glossaries, dictionaries,
thesauri, and word lists are all examples of vocabularies.
Most vocabularies focus on a special subject area (e.g. a
glossary of geographical terms) or audience (e.g., a dictionary
for the architecture and construction trades).
Structured vocabularies are collections of words and phrases
(called terminology) that are structured to show relationships
between terms and concepts. One of the tasks of a structural
vocabulary is to allow better retrieval be it in a card catalog
or a computerized database. For example, a vocabulary for
furniture would show that there is a relationship among the
three terms, bookcase, book case, and book-case. In this example,
the relationship is quite simple - they are spelling variations
for the same concept: a piece of case furniture with shelves
for books. These vocabularies may be applied as "controlled
vocabularies," where a given term (such as the "descriptor"
or "preferred term") is used consistently to represent
a given concept.
Why do we need vocabularies? It is because language is ever-changing,
nuanced, and complex. These very characteristics that make
language so wonderfully expressive can cause ambiguity and
confusion in documentation, and ultimately, hamper access
to materials in databases. Here are a few examples of how
language can cause confusion:
- National and regional differences
A particular type of a rectangular, gable-roofed barn is
called a Connecticut Barn in the United States. The same
type of barn is called an English Barn in Great Britain.
- Historical and contemporary names
The nation that is today called Iran was, before 1935, called
Persia.
- Indigenous vs. culturally inappropriate terms
Both terms KhoiKhoi and Hottentot have been used to refer
to a group of people in Southern Africa. In early 20th century
Western texts, these people were called Hottentot. Today,
KhoiKhoi is preferred and Hottentot is now considered culturally
inappropriate.
- Linguistic differences
The Italian artist, Tiziano, is called Titian in English
and Titien in French.
Structured vocabularies are especially designed to identify
and make these connections among terms by managing synonyms
and disambiguating homographs, resulting in improved results
for the database searcher. In this way, the terms in a vocabulary
serve as a knowledge base for the materials in the database.
Vocabularies are most effective when used together with other
standards, especially data structure and data content standards.
Read more about how vocabularies work in Chapter 5.
Read more about how standards work together in Chapter 3.

Vocabularies: Types and formats
A wide range of controlled vocabularies have been developed
to help describe and access art and material culture information.
Many of these vocabularies were created and are maintained
by research institutes, national and international cultural
organizations, and professional society and associations.
They can be used individually or together, depending on the
type of material being described.
Examples of types of terms that can be found in controlled
vocabularies available for describing cultural heritage.
- personal names
in the Union List of Artist Names you will find "Georgia
OKeeffe"
- geographic place names
in the Getty Thesaurus of Geographic Names you will find
"Botswana"
- corporate names
in the Library of Congress Name Authority File you will
find "Metropolitan Museum of Art (New York. N.Y)"
- object names
in the Art & Architecture Thesaurus you will find "scroll
paintings"
- iconographic subjects and themes
in ICONCLASS you will find the "education of Cupid
by Venus and Mercury"
- genre terms
in the Thesaurus for Graphic Materials II: Genre and Physical
Characteristic Terms you will find "political cartoons"
- multi-lingual terms
in the Multilingual Egyptological Thesaurus you will find
the term "pottery" in English, German, "keramik"
and French, "céramique".
Controlled vocabularies also come in a variety of formats
to fit different practices, systems, and local needs, as listed
below:
- Subject Heading Lists are compilations of headings usually
displayed in alphabetical order. Headings are words, phrases,
or combinations of words and modifiers that combine separate
concepts into what is called a "string." The LCSH
is an example of a subject heading list.
The following example is excerpted from the Library of Congress Subject Headings (LCSH), 18th edition, 1995:
Portrait prints (May Subd. Geog.)
UF Engraved portraits
BT Prints
--17th century (May Subd. Geog.)
--18th century (May Subd. Geog.)
--19th century (May Subd. Geog.)
Portrait prints, American (May Subd. Geog.)
UF American portrait prints
Portrait prints, British (May Subd. Geog.)
UF British portrait prints
Portrait prints, Chinese (May Subd. Geog.)
UF Chinese portrait prints
Portrait prints, European (May Subd. Geog.)
UF European portrait prints
Portrait prints, French (May Subd. Geog.)
UF French portrait prints
Portrait prints, German (May Subd. Geog.)
UF German portrait prints
Portrait sculpture (May Subd. Geog.)
BT sculpture
NT Portraits, Group
--18th century
--19th century
--20th century
--South Dakota
Portrait sculpture, African (Not Subd. Geog.)
UF African portrait sculpture |
- A Thesaurus is a compilation of terms representing single
concepts. A thesaurus displays relationships among terms
by creating what is called a "semantic network."
Thesauri are usually displayed as a hierarchy. Most thesauri
display three types of relationships among terms: hierarchical
(whole/part or genus/species), equivalence (synonyms), and
associative (related terms). Thesauri are referred to as
structured vocabularies, but in recent years this term also
has been used to describe any vocabulary with a structure,
even if it is not based on the above-mentioned thesaural
relationships.
Visit the Art & Architecture Thesaurus to view an example
of a thesaurus hierarchy display.
- Classifications organize a body of knowledge into
conceptual categories. Classification schemes like Revised
Nomenclature and ICONCLASS are intended to be
used as organizational schemes for collections. Sometimes,
classifications like the above-mentioned serve double-duty
when catalogers extract the individual terms and use them
as data values in a field, outside of the context of the
rest of the classification scheme. For example, a museum
may use the individual term "costume " from Revised
Nomenclature without adopting its ten-category classification
scheme to organize the museums collection.
The following is a section from The Revised Nomenclature For Museum Cataloging : A Revised And
Expanded Version of Robert G. Chenhalls System for Classifying Man-Made Objects.
Walnut Creek, CA: AltaMira Press, 1995, p. 9:
Category: 2: FURNISHINGS
BEDDING
BAG, SLEEPING
BEDSPREAD
BLANKET
BOLSTER
|
|
COMFORTER
Counterpane ... use BEDSPREAD
COVER, BOLSTER
|
|
MATTRESS
|
|
PILLOW
PILLOWCASE
|
SHEET |
- Term Lists are most often created by individual
organizations and reflect the scope of the institution collection.
Many of these local lists include terms from other controlled
vocabulary resources. Sometimes an organization will collaborate
with a similar collection to create a joint term list. Term
lists can be a rich resource for unique, regional, historical,
or very specific terminology. In some organizations the
term list also functions as the authority file.
Visit the SPIRO on-line visual database at the University
of California at Berkeley (http://www.mip.berkeley.edu/spiro/)
to view examples of term lists.

The role of authority work
Authority work, in which terms and names are verified and
validated, is a critical part of documentation practice. The
concept originated in the library cataloging domain in the
days of manual card catalogs and indexes when strict consistency
was necessary for minimal access. Today authority work has
extended to other information management communities and its
processes and procedures have benefited greatly from computerization.
The development and application of standard controlled vocabularies
is an significant outcome of authority work.
Authority work is defined by the following characteristics:
- Authority files are compilations of authorized terms
or headings used by a single organization or consortium
in cataloging, indexing, or documentation. Authority
files are strictly maintained as terms are applied and often
include associated information about the term or subject
heading. This associated information can include: synonyms
(e.g., "see references"), related or associated
terms (e.g., "see also references"), and original
sources (e.g., a note that the term was found in a particular
dictionary).
Here is an authority file record from the Library of Congress Name Authority File (NAF) for the author, umberto Eco. Note the variant names (Eko, Umberto, etc.) and the sources of the information (Notes):
Library of Congress Authorities, http://authorities.loc.gov.)
- Authority control is a system of procedures that maintains
consistent information in database records. This procedure
includes the recording of terms and the validation of terms
using the authority files. The purpose of authority control
is to ensure that the database searcher can collocate like
material and relate it to others in the database. Today
authority control is important in the online environment
for making searching easier for users and improving precision
in searching.
- An authority file is a controlled vocabulary, but not
all controlled vocabularies are authority files. This
is because the main purpose of an authority file is to regulate
usage in a particular database. In fact, you will find that
some authority files use multiple structured vocabularies
as a source for their files. For example, a historical society
may use both AAT and LCSH as a source for terms in their
institutions subject authority files. Most authority
files also include "local terms," originating
from the institution itself.
- Authority files are an integral part of most automated
information systems but you will find differing levels of
implementation depending on the system. One of the most
useful implementations is when the authority file is available
as a resource for catalogers and is interactive in the search
interface to assist users as they query the database.
- Authority work procedures may be automated, but the
intellectual processes needed to create quality authority
files are still best accomplished by humans. This work
may include: verification of the proposed term or name in
authoritative sources, such as dictionaries, monographs,
or (if relevant) historical sources; research of synonyms,
such as variant spellings; establishing relationships to
other terms/names in the authority file; and creation of
an authority record to be added to the database. Authority
work at the local level is often expensive and time-consuming
and as data sharing becomes more prevalent, shared authority
files are being explored.

Vocabulary building
Vocabularies are available for many different subject areas
and audiences but there are gaps in coverage, especially for
specialized areas. If you embark on building your own vocabulary
there are several good resources in the form of publications,
training workshops and academic courses. Here are a few suggestions
to get you started:
Recommendations for building new vocabularies
- Build on existing work. Some established vocabularies
like the AAT and LCSH, offer opportunities to enhance specific
areas. For example, recently the Mystic Seaport Museum staff
researched and added terminology for vessel types to the
AAT.
- Incorporate maintenance into your plan. In order for a
structured vocabulary to be effective it needs to accommodate
changes in the language over time.
- Adhere to national and international standards produced
by NISO, ISO and other standards organizations.
- Find partners and collaborate with other like-minded groups.
For example, the MDA supports terminology working groups,
such as the Ethnographic Terminology Working Group, who
pool resources to create vocabularies.
- Get training. Schools of library and information science,
cultural heritage management programs, and professional
workshops are all sources for training in thesaurus construction
and authority work.
- Follow established methodologies. For example, the J.
Paul Getty Trust has published guidelines for forming language
equivalents to enable multi-lingual vocabulary building.
Read more about vocabularies in the Readings section.
Go to a list of vocabulary resources and tools for cultural
heritage in the Tools section.

|