The three primary relationships relevant to the vocabularies discussed in this book are equivalence, hierarchical, and associative relationships. Relationships in a controlled vocabulary should be reciprocal. Reciprocal relationships are known as asymmetric when the relationship is different in one direction than it is in the reverse direction—for example, broader term/narrower term (BT/NT). If the relationship is the same in both directions, it is symmetric—for example, related term/related term (RT/RT).
3.1. Equivalence Relationships
Equivalence relationships are the relationships between synonymous terms or names for the same concept. A good controlled vocabulary should include terms representing different forms of speech and various languages where appropriate. Below are examples of terms in several languages that all refer to the same object type.
Ideally, all terms that share an equivalence relationship are either true synonyms or lexical variants of the preferred term or name or another term in the record.
Synonyms may include names or terms of different linguistic origin, dialectical variants, names in different languages, and scientific and common terms for the same concept. Synonyms are names or terms for which meanings and usage are identical or nearly identical in a wide range of contexts. True synonyms are relatively rare in natural language. In many cases, different terms or names may be interchangeable in some circumstances, but they should not necessarily be combined as synonyms in a single vocabulary record. Likewise, names for persons, places, events, and so on, may be used interchangeably in certain contexts, but their meanings may actually differ. Various factors must be considered when designating synonyms, including how nuance of meaning may differ and how usage may vary due to professional versus amateur contexts, historical versus current meanings, and neutral versus pejorative connotations. The creator of the vocabulary must determine whether or not the names or terms should be included in the same record or in separate records that are linked via associative relationships because they represent related concepts but are not identical in meaning and usage. In the examples below, each set of equivalent terms represents a single object type, style or culture, or person.
Jeanneret, Charles Édouard
Jeanneret-Gris, Charles Édouard
22.214.171.124. Lexical Variants
Although they are grouped with synonyms for practical purposes, lexical variants technically differ from synonyms in that synonyms are different terms for the same concept, while lexical variants are different word forms for the same expression. Lexical variants may result from spelling differences, grammatical variation, and abbreviations. Terms in inverted and natural order, plurals and singulars, and the use of punctuation may create lexical variants. In a controlled vocabulary, such terms should be linked via an equivalence relationship.
In the example below, the past participle embroidered is included in the record for the process embroidering (needleworking (process), <needleworking and needleworking techniques>,... Processes and Techniques).
Certain lexical variants could be flagged as alternate descriptors (AD), meaning that the AD and the descriptor (D) are equally preferred for indexing. For example, for objects, animals, and other concepts expressed as singular and plural nouns, the plural may be the descriptor, while the singular would be the alternate descriptor. In other cases, the past participle or an adjectival form may be an alternate descriptor.
baluster columns (D)
baluster column (AD)
126.96.36.199. Historical Name Changes
Political and social changes can cause a proliferation of terms or names that refer to the same concept. For example, the term used to refer to the ethnic group of mixed Bushman-Hamite descent with some Bantu admixture, now found principally in South Africa and Namibia, was previously Hottentot. That term now has derogatory overtones, so the term KhoiKhoi is preferred. However, a vocabulary such as the AAT would still link both terms as equivalents so that retrieval is thorough.
Names of people and places also change through history: People change their names, as when a title is bestowed or a woman marries. Place names change for a variety of reasons, as when North Tarrytown, New York, changed its name to Sleepy Hollow in 1996, or when the nation formerly known as the Union of Burma changed its name to the Union of Myanmar in 1989.
The issues that surround such historical changes are many. Determining when names are equivalents and when they instead refer to different entities is not always clear. For example, Persia is a historical name for the modern nation of Iran prior to 1935, yet ancient Persia was not entirely coextensive with modern Iran. Likewise, modern Egypt is not the same nation as ancient Egypt—neither in terms of borders nor of administration—therefore the names may be homographs, but not necessarily equivalents.
188.8.131.52. Differences in Language
Vocabularies may be monolingual or multilingual. Regional and linguistic differences in terminology are among the most common factors influencing variation among terms that refer to the same concept in monolingual vocabularies. Regional differences in terminology occur due to vernacular variations; for example, English barn, Connecticut barn, New England barn, and Yankee barn are all terms that refer to the same type of structure: a rectangular, gable-roofed barn that is divided on the interior into three roughly equal bays.
Multilingual vocabularies require the resolution of other issues in addition to those surrounding monolingual vocabularies. Cultural heritage communities around the world wish to share information, and users in many nations try to gain access to the same material on the Web. They need to retrieve the correct information on an object regardless of whether it has been indexed under pottery, keramik, or céramique. This is not always a simple prospect; forming equivalents is not just a matter of providing literal translations. For example, a nonexpert translator or a computer program might translate the English term toasting glasses from the AAT vessels hierarchy into Spanish as vasos para tostar, which would seem to have something to do with a toaster oven rather than honoring someone with a toast (toasting glasses are tall, thin wineglasses with a small conical bowl, a stemmed foot, and a very thin stem that can easily be snapped between the fingers).
The names of people and places may also vary in different languages. As illustrated in the example on the previous page, this sixteenth-century Italian sculptor, who was born in Flanders (now Belgium) but worked in Italy, is known by many variations on his name, including the French Jean de Bologne and the Italian names Giambologna and Giovanni da Bologna, The name of Mato Wanartaka, the Native American artist who painted the Battle of the Little Big Horn, is translated into Kicking Bear in English. All these name variations must be linked together within a single vocabulary record as equivalents. Additional variations occur when names are transliterated by different methods into the Roman alphabet; for example, the names Beijing, Peking, and Pei-Ching all refer to the same city in China.
Further issues surrounding multilingual vocabularies and the mapping of terms between languages are discussed in Chapter 5: Using Multiple Vocabularies.
Names and terms that are similar or identical except for the use of diacritics should typically be included as variant names. Expressing names and terms in the original character sets or alphabets other than the Roman alphabet introduces additional issues, as discussed in Chapter 9: Retrieval Using Controlled Vocabularies.
3.1.2. Near Synonyms
Near synonyms are discussed under 2.3.4. Synonym Ring Lists; they may be found in other vocabularies as well. Although it is generally advisable to link only true synonyms and lexical variants as equivalents, in some vocabularies the equivalence relationship may also include near synonyms and generic postings in order to broaden retrieval or cut down on the labor involved in building a vocabulary, among other reasons.
Near synonyms, also known as quasi-synonyms, are terms with meanings that are regarded as different, but the terms are treated as equivalents in the controlled vocabulary to broaden retrieval. Near synonyms are words that have similar but not identical meaning, such as ice cream and gelato. Both are frozen desserts made from dairy products, but ice cream is usually made with cream, and gelato is usually made with milk and has less air incorporated than ice cream. In other cases, antonyms—for example, smoothness and roughness—may be linked via the equivalence relationship in a vocabulary.
The phrase generic posting refers to the practice of putting terms with broader and narrower contexts together in the same record. For example, if egg-oil tempera were linked as an equivalent to tempera, this would be a generic posting because egg-oil tempera is a type of tempera.
In a vocabulary striving for more precise relationships, these terms should be linked with appropriate hierarchical relationships or associative relationships rather than as equivalents.
3.1.3. Preferred Terms
When multiple terms refer to the same concept, one term is generally flagged as a preferred term and the others are variant terms. In thesaurus jargon, the preferred term is always called a descriptor, and other terms may be called alternate descriptors, or used for terms.
For each concept or record, builders of a controlled vocabulary should choose one term or name among the synonyms as the preferred term. Preferred terms should be selected to serve the needs of the majority of users, relying upon established and documented criteria. For the sake of predictability, these criteria should be applied consistently throughout the controlled vocabulary. If, for example, American spelling is preferred over British spelling in a particular controlled vocabulary, the preferred terms or names should always be in American English. If the vocabulary is intended for a general audience, the preferred term should be the name or term most often found in contemporary published sources in the language of the users. The criteria for establishing preferred terms should be documented and explained to end users.
In the examples on the following page, Georgia O'Keeffe and Mrs. Alfred Stieglitz are names that refer to the same artist; the former name is preferred because this is the name by which she is most commonly known. In another example, the terms still lifes and nature morte refer to the same concept; the former term is preferred in English. In a third example, Wien, Vienna, and Vindobona refer to the same city; Vienna is the preferred current name in English, while Wien is the current German name, and Vindobona is a historical name.
The vocabulary may flag terms or names that are preferred in various languages. Terms preferred in other languages are also descriptors; that is, one record may have multiple descriptors. Each language represented may have a descriptor. However, only one of the descriptors should be flagged as preferred for the entire record.
A homograph is a term that is spelled identically to another term but has a different meaning. For example, drums can have at least three meanings: components of columns, musical instruments classified as membranophones, or walls that support a dome. Words can be homographs whether or not they are pronounced alike. For example, bows, the forward-most ends of watercraft or airships, and bows, stringed projectile weapons designed to propel arrows, are spelled alike but pronounced differently. Homophones are terms that are pronounced the same but spelled differently, for example bows and boughs; controlled vocabularies generally need not concern themselves with labeling homophones.
Note that a controlled vocabulary is constructed differently from a dictionary. In a dictionary, homographs are listed under a single heading, with several definitions. For example, in a dictionary, drum would be listed as a noun, with several definitions under a single entry. In a controlled vocabulary, each homographic term is in a separate record.
Controlled vocabularies must distinguish between homographs. One way to do this is to add a qualifier. A qualifier consists of one or more words used with the terms to make the specific meaning of each unambiguous, as seen in the examples below.
drums (column components)
Qualifiers should be distinguished from the term itself in displays. Traditionally, parentheses are used to identify the qualifier. In order to make construction of and use of the vocabulary more versatile, it is useful to place the qualifier in a separate field in the database rather than in the same field as the term itself.
If a term is a homograph to another term in the vocabulary, at least one qualifier is necessary. However, it is best to add a qualifier for both terms for clarity. Homographs and their qualifiers may occur not only with descriptors but also with alternate descriptors and used for terms. In addition, if a term is a homograph for another common term in standard language, even if the second term is not in the vocabulary, it is useful to add a qualifier for clarity.
A qualifier is sometimes also called a gloss; however, in linguistic jargon a gloss actually has the more general meaning of any term or phrase providing meaning or explanation for difficult words or passages. In contrast, a qualifier is used only to disambiguate homographs, not to define the term or provide context (although it may do so coincidentally because these characteristics may be what distinguish a term from its homograph).
Qualifiers should be used only to disambiguate homographs, not to represent a compound concept, define a term, or establish a term's hierarchical context. Some controlled vocabularies create qualifiers for these other purposes, but this is considered bad practice. Other situations should be handled in the following ways: To make a bound compound concept, construct a descriptor rather than using a qualifier (e.g., phonograph record, not record (phonograph)). Alternatively, if it is an unbound concept, rather than creating a qualified term in the thesaurus, end users should be allowed to construct a multiple-word search phrase in retrieval. For example, neither cathedral (Baroque) nor the descriptor Baroque cathedral (because that is an unbound concept) should be created in the thesaurus; instead, Baroque AND cathedral should be used in retrieval. The term should be defined in the scope note, not by using a qualifier. To establish context for the term in displays outside of homographic disambiguation, a heading or label for the term should be created rather than trying to do so with a qualifier (see 184.108.40.206.1. Headings or Labels).
220.127.116.11.1. How to Choose a Qualifier for a Term
The builders of controlled vocabularies should establish detailed rules for how to compose qualifiers. Qualifiers should be as brief as possible, ideally consisting of one or two words.
In most cases, a word or words from a broader context of the term should be used as the qualifier (e.g., stained glass (material), where stained glass is a hierarchical descendant of materials). Qualifiers for all homographs should clearly disambiguate the terms in displays. For example, stained glass (material) and stained glass (visual works) distinguish the material from the artworks made from the material.
If words taken from the broader context do not sufficiently disambiguate between homographs, use words that describe another significant distinguishing characteristic.
Qualifiers should be standardized as much as possible within a controlled vocabulary. For example, films and motion pictures should not both be used as qualifiers because films is a used for term for motion pictures. When possible, the qualifier should have the same grammatical form as the term, as with the nouns and gerunds in the examples below.
Term: trailers Qualifier: motion pictures
Term: trailers Qualifier: vehicles
Term: forging Qualifier: copying
Term: forging Qualifier: metal forming
18.104.22.168 Other Ways to Disambiguate Names
Qualifiers are used frequently in controlled vocabularies containing terminology for object types, generic concepts, and so on, as illustrated above. For other vocabularies, such as personal name and geographic name vocabularies, data from various fields may be concatenated with the name or term to disambiguate entries. For example, the name of a person could be displayed with biographical information to create a heading—e.g., Johnson, John (English architect, 1754–1814)—or the name of a place could be displayed with place type and broader contexts taken directly from the hierarchy—e.g., Springfield (inhabited place) (Tuolumne county, California, United States). Headings and labels may be used not only to disambiguate homographs but also to provide context for terms and names when displayed in any horizontal string (see 22.214.171.124.1. Headings or Labels).
3.2. Hierarchical Relationships
Hierarchical relationships are the broader and narrower (parent/child) relationships between logical records (where each record represents a concept). The hierarchical relationship is the primary feature that distinguishes a thesaurus or taxonomy from simple controlled lists and lists of synonym rings.
Hierarchical relationships are referred to by genealogical terms such as child, children, siblings, parent, grandparent, ancestors, descendants, etc. In the example on the following page, the Upper Egypt region is the parent of Qinā governorate; Karnak and Luxor are children of Qinā governorate and siblings of each other; and Africa is an ancestor of all these places. The display of hierarchical relationships is discussed in Chapter 7: Constructing a Vocabulary or Authority.
There are several types of hierarchical relationships, including whole/part, genus/species, and instance relationships.
3.2.1. Whole/Part Relationships
Hierarchical relationships are generally either whole/part, also called a partitive relationship (e.g., Karnak is a part of Qinā governorate), or genus/species, also called a generic relationship (e.g., bronze is a type of metal).
Whole/part relationships are typically applied to geographic locations, parts of corporate bodies, parts of the body, and other types of concepts that are not readily placed into genus/species relationships. Each child should be a part of the parent and all the other ancestors above it.
3.2.2. Genus/Species Relationships
The genus/species, or generic relationship, is the most common relationship in thesauri and taxonomies because it is applicable to a wide range of topics. All children in a genus/species relationship should be a kind of, type of, or manifestation of the parent (compare instance relationships below). The placement of a child may be tested by the all/some argument. In the example of bronze above, all architectural bronze is bronze, but only some bronze is architectural bronze.
3.2.3. Instance Relationships
In addition to the whole/part and genus/species relationships, some vocabularies may utilize a third type of hierarchical relationship, the instance relationship. This is most commonly seen in vocabularies where proper names are organized by general categories of things or events, for example, if the proper names of mountains and rivers were organized under the general categories mountains and rivers.
3.2.4. Facets and Guide Terms
Facets provide the primary subdivisions of a hierarchy, typically located directly under the root or top of the hierarchy. Subfacets, also called hierarchies, may subdivide the facets. Guide terms (types of node labels) are additional levels that collocate similar sets or classes of records (illustrated in the example below with angled brackets). They should logically illustrate the principles of division among a set of sibling terms, as discussed in Chapter 7: Constructing a Vocabulary or Authority.
Some concepts logically belong to more than one broader context. To accommodate this situation, the data structure of a properly constructed thesaurus should allow polyhierarchical relationships, meaning that each record exists only once in the vocabulary but may be linked to multiple parents and can thus appear in multiple hierarchical views. Polyhierarchical relationships may exist in whole/part, genus/species, and instance relationship models. In the example below, Siena is part of the modern nation of Italy, but it was also part of the ancient confederation of Etruria.
The criteria for creating polyhierarchical relationships should be explicitly established. In the example below, the polyhierarchy is used to link a place to both its current and historical parents; the nonpreferred parent relationship is indicated with an N in brackets.
The established classification scheme of the hierarchy should be considered, and terms should be placed under multiple parents when they logically belong to those parents. For example, in the AAT, a backing hammer should be located under the guide term <bookbinding equipment>, but it also belongs under hammers (tools).
3.3. Associative Relationships
Associative relationships exist between records that are conceptually close, but where the relationship is neither equivalent nor hierarchical. The most basic type of associative relationship is simply related to. In some vocabularies, more specific types of associative relationships may be designated.
3.3.1. Types of Associative Relationships
Associative relationships may be made between records in the same hierarchy or in different hierarchies. There may be relationships between overlapping siblings or other terms where the meanings are similar and the terms are occasionally (but not generally) used as synonyms.
In general, terms that are mutually exclusive do not require associative relationships, particularly if they cannot be confused with one another, whether or not they share the same parent. For example, it is not necessary to link baluster columns and spiral columns below because there is no reason why a user would confuse the two.
However, there should be associative relationships between terms that are intended to be used as separate concepts but may be confused by users. In the first example on the following page, Lorraine, the current administrative region, and Lorraine, the historical entity, share the same name and some of the same territory; thus an associative relationship helps distinguish between the two and at the same time links them for possible retrieval. In the second example, the term military bases is distinguished from military camps, with which it is sometimes confused. If it is necessary to mention the second concept in the scope note in order to distinguish the two, the records should be linked through associative relationships.
In addition to the relationships described above, antonyms may be treated as associative relationships. In fact, a vocabulary may require a substantial number of very specific additional associative relationships. These types of relationships vary from vocabulary to vocabulary, depending upon the nature of the terms and how they are intended for use in retrieval. For example, relationships between generic terms would differ from relationships between people, which could include familial and professional relationships. A vocabulary should list and define the types of associative relationships used. Partial lists of associative relationships for the Getty vocabularies appear above.
3.3.2. When to Make Associative Relationships
Only clear and direct associative relationships should be recorded. These direct relationships are typically current but occasionally may be historical. Given that associative relationships are more challenging to define than hierarchical relationships, care must be taken to consistently apply rules when assigning associative relationships in a vocabulary in order to prevent an excessive number of such relationships, which can have a negative effect when the thesaurus is used for retrieval.
Since associative relationships are often used not only for the reference of a user but also for retrieval, it is important to avoid making unnecessary links between related concepts. Relationships should be made only between records that are directly related, but where hierarchical and equivalent relationships are inappropriate. If a thesaurus is bound together by too many associative relationships between entities that are only loosely or indirectly related, the value of the relationships in retrieval is lost. Consider this question: if the end user is interested in retrieving Concept X, might he or she possibly also want to retrieve Concept Y? If not, there probably should not be an associative relationship between the two records.
Associative relationships may be displayed and described explicitly as in the example below or by using the generic notation RT, for related term, or the phrase see also.
see also collecting
Associative relationships are always reciprocal. For some relationships, the relationship type is the same on both sides of the link (e.g., related to); however, for others it is different depending upon which record is the focus. Vocabulary editors must be very careful to choose the correct relationship for the focus record (i.e., the record being edited when the relationship is made). It is important to consider what will make sense when displayed to a user. For example, in an associative relationship between artists, Katsushika Hokusai was the teacher of Katsushika Taito II; their relationship is teacher/student. In the record of a student, the relationship type linking to the teacher is student of, because the artist in the focus record is the student of the artist in the linked record. In the record for the linked artist, the reciprocal relationship type is teacher of.
If a vocabulary has relationships that are homographs, or if values may change over time, it is best to identify the relationships with unique numeric codes rather than simply by text values. When relationship types are homographs, the vocabulary editor must be careful to link to the correct code. As illustrated in the ULAN example on the following page, in linking an uncle to his niece, the vocabulary editor must be sure to link to uncle of #1533, which has the code for niece of #1534 as its reciprocal code. The editor should not link to the homograph uncle of #1532, because its reciprocal code is for nephew of.