Subject Access to Art Images
The Language of Images
It Begins with the Cataloguer
The Image User and the Search for Images
Annoted List of Tools
Selected Bibliography
Illustration Credits
Printer Friendly PDFs

Introduction to Art Image Access

The Language of Images: Enhancing Access to Images by Applying Metadata Schemas and Structured Vocabularies

Patricia Harpring
Managing Editor, Getty Vocabulary Program
Getty Research Institute

The appetite of end-users, hungry for images, is rarely sated. Images are notoriously difficult to retrieve with accuracy, as is evident to anyone who has searched for images on the World Wide Web. Retrieval of appropriate images depends on intelligent indexing, which one might call the "language" of retrieval; in turn, good indexing depends on proper methodology and suitable terminology. In this essay, I address the underpinnings of indexing by exploring the use of metadata schemas1 and controlled vocabularies to describe, catalogue, and index works of art and architecture, and images of them. I also discuss issues relating to data structure, cataloguing rules, vocabulary control, and retrieval strategies, which are central components of good subject access.

What Is "Subject"?

Categories for the Description of Works of Art (CDWA) characterizes "subject" very broadly as follows:

The subject matter of a work of art (sometimes referred to as its content) is the narrative, iconic, or non-objective meaning conveyed by an abstract or a figurative composition. It is what is depicted in and by a work of art. It also covers the function of an object or architecture that otherwise has no narrative content.

CDWA describes a metadata element set that can be used to describe or catalogue many types of objects and works of architecture in a single information system. In the interest of providing access across all catalogued objects by all of the critical fields (the "core" categories), CDWA advises that the Subject Matter category should always be indexed, even when the object seems to have no "subject" in the traditional sense. In other words, in CDWA all works of art and architecture have subject matter.

Even though the subject matter of a work of art may also be referred to in the Titles or Names category of CDWA, a thorough description and indexing of the subject content should be done separately in the Subject Matter category. A title does not always describe the subject of the work. More importantly, noting the subject of a work of art in a set of fields or metadata elements dedicated specifically to subject ensures that the subject is consistently recorded and indexed in the same place, using the same conventions for all objects in the database. The title of the photograph in figure 7, Chez Mondrian, Paris, does not convey a basic description of the subject of the photograph. Its subject could be described as "an interior space with a stairway, doorway, table, and a vase with flowers."

The subject matter of a work may be narrative, but other types of subjects may also be included. A narrative subject is one that comprises a story or sequence of events. Examples of narrative subjects are The Slaying of the Nemean Lion and The Capture of the Wild Boar of Mount Erymanthus, which are both episodes in the Labors of Herakles series. Subject matter that does not tell a story could be, for example, a painting or sculpture of a genre scene, such as a young woman bathing. For a portrait, the subject can be a named sitter; for a sketch, an elevation for the facade of a building; for a pot or other vessel, its geometric decoration or its function; for a mosque or synagogue, its function as a place of worship. Subject matter can also take the form of implied themes or attributes that come to light through interpretation. For example, a brass doorknob with an embossed lion's head can express meaning beyond the depiction of an animal; it may suggest the householder's strength or confer protection on the house.
Kertesz/Chez Mondrian, Paris

In a scholarly discussion of subject matter, various areas of subject analysis are often woven together into a seamless whole. It is useful, however, to consider them separately when indexing a work of art. One level of subject analysis could include an objective description of what is depicted; for example, in the Sodoma drawing in figure 8, the words "human male," "nude," "drapery" describe the image in general terms. An identification of the subject would be "resurrected Christ." The image could be further analyzed, noting that the iconography represents "salvation" and "rebirth."
Sodoma/The Reserrection

In CDWA, subject matter is analyzed according to a method based on the work of Erwin Panofsky.2 Panofsky identified three main levels of meaning in art: pre-iconographic description, iconographic identification, and iconographic interpretation or "iconology." Three sets of subcategories under the category Subject Matter in CDWA reflect this traditional art-historical approach to subject analysis, but in a somewhat simplified and more practical application of the principles, one better suited to indexing subject matter for purposes of retrieval. (Panofsky was writing decades before the advent of computer databases of art-historical information and the proliferation of resources on the World Wide Web.) The following three levels of subject analysis are defined in CDWA:

Subject Matter—Description. A description of the work in terms of the generic elements of the image or images depicted in, on, or by it.
Subject Matter—Identification. The name of the subject depicted in or on a work of art: its iconography. Iconography is the named mythological, fictional, religious, or historical narrative subject matter of a work of art, or its non-narrative content in the form of persons, places, or things.
Subject Matter—Interpretation. The meaning or theme represented by the subject matter or iconography of a work of art.

These three levels of subject analysis can be illustrated in Andrea Mantegna's Adoration of the Magi (pl. 4). A generic description of Mantegna's painting would point out the elements recognizable to any viewer, regardless of his or her level of expertise or knowledge: it depicts "a woman holding a baby, with a man located behind her, and three men located in front of her." Possible indexing terms to describe the scene could be "woman," "baby," "men," "vessels," "porcelain vessel," "coins," "metal vessel," "costumes," "turbans," "hats," "drapery," "fur," "brocade," "haloes." The next level of subject analysis is identification, which is often the only level of access cataloguing institutions routinely provide. The painting depicts a known iconographic subject that is recognizable to someone familiar with the tradition of Western art history: "Adoration of the Magi." The iconography is based on the story recounted in the New Testament (Matthew 2), with embellishments from other sources. The proper names of the protagonists are Balthasar, Melchior, Caspar, Mary, Jesus, and Joseph; these names should also be listed as part of the identifiable subject.

The third level of subject analysis is interpretation, where the symbolic meaning of the iconography is discussed. For example, the Magi represent the Three Ages of Man (Youth, Middle Age, Old Age), the Three Races of Man, and the Three Parts of the World (as known in the fifteenth century: Europe, Africa, Asia). The gifts of the Magi are symbolic of Christ's kingship (gold), divinity (frankincense), and death (myrrh, an embalming spice). The older Magus kneels and has removed his crown, representing the divine child's supremacy over earthly royalty. The journey of the Magi symbolizes conversion to Christianity. Details related to the subject, as depicted specifically in this painting, could include Mantegna's composition of figures and objects, all compressed within a shallow space in imitation of ancient Roman reliefs.

Even when a work of art or architecture has no overt figurative or narrative content, as with abstract art, architecture, or decorative arts, subject matter should still be indexed in the appropriate metadata element or database field. In the case of a work of abstract art, John M. Miller's Prophecy (fig. 9), visual elements of the composition can be listed, including the following: "abstract," "lines," "space," "diagonal." The symbolic meaning, as stated by the artist, should also be included. In this case, the artist's work was inspired by a fifteenth-century prayer book.3 This aspect of the subject could be listed as follows: "Jean Fouquet," "Hours of Simon de Varie," "Madonna and child," "patron," "kneeling," "inward reflection," "moment in flux."

It may seem something of a stretch to designate subject matter for decorative arts and architecture, where no recognizable figure or symbolic interpretation is possible. For the sake of consistency, however, and always keeping end-user retrieval in mind, it is useful to note subject matter for these types of objects as well. The subject of a carpet, such as the one shown in figure 10, could be design elements and symbols of the patron for whom it was made, such as "flowers," "fruit," "acanthus leaf scrolls," "sunflower," "Sun King," "Louis XIV." The subject of a Renaissance drug jar, such as the one shown in figure 11, could be its function, as well as its decoration which is intended to invoke the exotic East, even though the characters of the script are invented and nonsensical: "drugs," "medicines," "pharmacy," "storage," "Middle East," "China," "Islamic knot work," "Kufic script," "Chinese calligraphy," "alphabet." Indexing terms for describing the subject matter of the pair of globes in figure 12 could be "Earth," "heavens," "geography." The subject of a building, such as the J. Paul Getty Museum (fig. 13), could be the building's function and critical design elements: "art museum," "space," "square," "axes," "reflection," "shadow."
Savonnerie Manufactory/Carpet
Cylindrical Jar (Albarello)
Nollet/Pair of Terrestrial and Celestial Globes
Meier/Museum Courtyard

Since information about art is often uncertain or ambiguous, there may be multiple interpretations for the subject of a particular work. Given that interpretations of subjects can change over time and that more than one interpretation may exist at one time, the history of the interpretation of the work should also be noted. For example, the sitter in Jacopo Pontormo's Portrait of a Halberdier (fig. 14) is sometimes identified as the Florentine duke Cosimo de' Medici, but he is more often considered to be the young nobleman Francesco Guardi. An "unbiased," objective description would identify the sitter simply as a "halberdier" or "soldier." The subject matter of this painting should be accessible by any of these subject designations. It is important to have a data structure that allows for this kind of variety and flexibility.
Pontormo/Portrait of Halberdier (Francesco Guardi?)

Structure to Allow Subject Access

Among the key decisions that must be made to provide subject access to images is selection of the appropriate format or metadata schema. Indeed, a suitable data structure is essential for creating good end-user access to images. The data structure must include all necessary fields; it must allow repeating fields as appropriate; and it must include links or otherwise accommodate the particular relationships that are inherent between museum objects and works of architecture (or their visual surrogates) and the subjects depicted in them.

The data structure for subject access must be contained within an overall workable data structure for the objects being described or catalogued. To successfully create a versatile, useful information system on art and architecture, several critical issues must be addressed. The institution or cataloguing project must decide what is being catalogued: museum objects, groups of objects, buildings, or visual documents (surrogate images) of those objects or buildings. Other decisions are critical to the format and structure of the system: Which metadata elements or fields are critical? Are there additional optional fields that are desirable but not necessary for retrieval? Which fields should be repeating? Which fields should be populated with controlled vocabulary terms? Should there be linked authorities?
Nikodemos/Panathenaic Prize Amphora with Lid

CDWA specifies fields for various attributes of an object record, including a set of fields for subject identification in the category Subject Matter.4 This set of fields is repeatable, and includes a field for a free-text description of the subject, as well as fields for indexing terms. For the fourth-century b.c.e. Greek amphora shown in figure 15, the free-text description of the subject might be the following: "Side A: Athena Promachos; Side B: Nike crowning the victor, with the judge on the right and the defeated opponent on the left." The important elements of the subject are then indexed with controlled vocabulary terms to provide reliable retrieval; for example, the indexing terms for this object might be "human male," "human female," "nudes," "Greek mythology," "Athena Promachos," "Nike, "judge," "competition," "game," "games," "athlete," "prize," "festival," "victory." Ideally, all three levels of subject matter (description, identification, and interpretation) should be analyzed and indexed for access, although the terms should be stored in the same table for end-user retrieval.5A sample descriptive record for the amphora, formulated according to CDWA guidelines, is shown below (core categories are indicated with asterisks).

Display versus Indexing

For an information system to be effective, information for display and information intended for search and retrieval must be distinguished. A field for display is all that the end-user sees. Information critical for research must, however, also be properly indexed in fields to allow adequate retrieval. The field for description or display can provide a clear, coherent text that identifies or explains the subject. As I have already pointed out, art information can often be ambiguous or even seemingly contradictory. In the display field, uncertainty and ambiguity can be expressed in a way that is intelligible to end-users; words such as "probably" and "possibly" may be used. For example, the subject for one Dosso Dossi painting (see pl. 3) could be described in a display field as follows: "Mythological scene, uncertain subject; probably represents 'love' and 'lust,' personified with central figures that are possibly Pan, Echo, Terra, and an unidentified goddess." The indexing fields would use controlled vocabulary to ensure reliable, consistent access to the same information. All terms representing all possible interpretations should be included for access; for the Dossi painting, the terms could include "Greek mythology," "love," "lust," "cupids," "landscape," "nude," "human female," "flowers," "Pan," "satyr," "nymph," "Echo," "Terra," "elderly female," "armor," "goddess."

Specificity versus Inclusivity

In the Dosso Dossi painting, the indexing terms include all likely interpretations of the subject matter. This is the approach taken by a knowledgeable cataloguer who can be specific in listing the possible subjects. A different approach must be used when the cataloguer does not know the subject due to lack of information—that is, if the information is possibly "knowable," but simply "not known" because the particular cataloguer does not have the time or means to do the research. In such cases, it is advisable to list terms that are broad and accurate rather than to be specific at the risk of being inaccurate. If the cataloguer is not familiar with the scholarly literature addressing the likely purpose of the maiolica jar shown in figure 11, the cataloguer is better off calling it a "vessel" or even a "container" rather than guessing that it may be a "drug jar." For the eighteenth-century French woodcarving shown in figure 16, the cataloguer should not try to surmise the allegorical meaning of the work if he or she does not have research or documentation to support the supposition. In such a case, the cataloguer could resort to performing only the first level (description) of subject analysis, naming the objects clearly seen in the piece: "flowers," "medallion," "bird," "nest." Only if there is credible supporting evidence should indexing terms relating to the allegory—for example, "Constitution of 1791," "French Revolution," "French monarchy," "death," "National Assembly," "failure," "ending"—be added.
Parent/Carved Relief

Repeating Fields

Repeating fields refers to a data structure in which there are multiple occurrences of a given field, so that multiple terms or data values may be recorded efficiently. CDWA suggests which fields or metadata elements should be repeating. Obviously, the field for Subject Matter should be repeatable. Repeating fields can store indexing terms for all three levels of subject analysis; although these aspects of the subject are analyzed separately, retrieval is more efficient if they are stored together. Multiple interpretations of the subject can also be indexed and recorded in this set of fields.


CDWA describes a set of relational tables that includes information about the object along with links to tables that hold information about the subject in a Subject Identification Authority. There are also links to other authorities as well. In this context, an "authority" is a separate file in which important information indirectly related to the objects being described can be recorded. A "link" may be made between the appropriate field in the object record and the relevant authority record. The relationship of authorities to object records in an information system is presented in the following entity-relationship diagram:

entity-relationship diagram

An authority for subjects provides an efficient way to record preferred and variant names, broader concepts, and related information regarding subjects. The information need be entered only once in the authority record rather than in each object record related to that subject. For some subject information, authorities may be efficiently constructed by using previously compiled data.6 The fields in the CDWA's Subject Identification Authority are Subject Type, Preferred Subject Name, Variant Subject Names, Dates, Earliest Date, Latest Date, Indexing Terms, Related Subjects, Relationship Type, Name of Related Subject, Remarks, and Citations.

The Subject Identification Authority7 contains fields for the preferred, or most commonly known, name of the subject, as well as variant names by which the subject may also be known; variant names in multiple languages could also be included. Many subjects may be known by multiple names, all of which are useful to include as access points for search and retrieval. Using such a controlled vocabulary or classification system ensures that synonyms are available for end-user access. For example, "Three Kings" and "Three Wise Men" are variant names for the "Magi" "stag beetle" and "pinching bug" are synonyms for an insect of the family "Lucanidae." Because the cataloguer or indexer has no way of knowing which form or forms end-users will choose in searching, as many variant forms as possible (or reasonable) should be included. The following sample subject authority record offers several name variants for the preferred name "Herakles": "Hercules," "Heracles," "Ercole," "Hercule," "Hércules." Using an authority or controlled vocabulary ensures that all these synonyms can be used in search and retrieval.
  • Subject Type: mythological character, Greek and Roman
  • Subject Name: Herakles
  • Variant Subject Names: Hercules, Heracles, Ercole, Hercule, Hércules
  • Display Dates: story developed in Argos, but was taken over at early date by Thebes; literary sources are late, though earlier texts may be surmised.
  • Earliest: –1000   Latest: 9999 (date ranges for searching)
  • Indexing terms: Greek hero, king, strength, fortitude, perseverance, labors, labours, Nemean lion, Argos, Thebes
  • Related Subjects: Labors of Herakles, Zeus, Alcmene, Hera
  • Remarks: Probably based on actual historical figure, a king of ancient Argos. The legendary figure was the son of Zeus and Alcmene, granddaughter of Perseus. Often a victim of jealous Hera. Episodes in his story include the Labors of Herakles. In art and literature Herakles is depicted as an enormously strong, muscular man, generally of moderate height. His characteristics include being a huge eater and drinker, very amorous, generally kind, but with occasional outbursts of brutal rage. He is often depicted with characteristic weapons, a bow or a club; he may wear or hold the skin of a lion. In Italy he may be portrayed as a god of merchants and traders, related to his legendary good luck and ability to be rescued from danger.
  • Citations: Grant, Michael, and John Hazel, Gods and Mortals in Classical Mythology (Springfield, Mass.: G & C Merriam, 1973); Encyclopedia Britannica Online, "Heracles" (Accessed 06/02/2001)

Other fields are also useful in providing access. In the sample subject authority record for Herakles, a note (corresponding to the Remarks category in CDWA) describes the iconography associated with Herakles and some of the ways in which this figure may appear in works of art. Terms that allow researchers to find all similar subjects must be indexed as well; such indexing provides access to the record (and thus to objects linked to it). In the sample record, examples for Herakles could appear in the "indexing terms" field: "Greek hero," "king," "strength," "fortitude," "perseverance," "Labors," "Labours," "Nemean lion," "Argos," "Thebes." They include places, events, and characters related to the iconography of Herakles, as well as abstract attributes symbolized by the Greek hero (for example, "strength" and "fortitude"). The subject authority can also contain a date field, noting the time frame when the subject may have been developed or when it was first documented. In addition, links to other subject authority records may be useful; the record for Herakles is linked to the records of other protagonists related to the iconography of this mythological figure, namely "Hera" and the "Nemean lion." There can also be a field for listing sources for more information about the subject.

Hierarchical Relationships

Layne stresses in her essay in this volume the power that vocabularies and classification systems with syndetic structures can have for indexing and retrieval. Thus it may be desirable to design an information system that allows for hierarchical relationships for subjects. One way to maintain distinctions among related iconographic themes efficiently is to create a data structure that makes it possible to link records. For example, the episodes of the Labors of Herakles could be linked hierarchically to the general record for Herakles and to even broader concepts such as classical mythology or Greek heroic legends,8 as shown in the following example from the ICONCLASS system


Published controlled vocabularies that have gained a degree of acceptance in the visual resources and art-historical communities can be used to record terms for subject matter. If an authority for subject identification is being created for a particular collection or body of material, such controlled vocabularies can be used to "populate" the authority file.

No single authority can provide adequate subject access for most collections. Typically, institutions will have to create an authority for local use, one compiled, whenever possible, from existing controlled vocabularies. A number of vocabularies are currently available for "populating" local authority files. The ICONCLASS system has proven to be a powerful tool for recording and providing access to iconographic themes, particularly for Western art.9 This system, developed in the Netherlands and now in use in many countries and institutions, contains textual descriptions of subject matter in art, organized by alphanumeric codes that can be arranged in hierarchies. The Art & Architecture Thesaurus (AAT) is a source of terms for describing architectural subjects or objects (for example, "onion dome," "cathedral," "columns"). The Library of Congress's Thesaurus for Graphic Materials (TGM), like the AAT, is useful for populating authority files for object type or medium, but it can also provide terms for subject authorities. The Getty Thesaurus of Geographic Names (TGN) can provide the names of places depicted in or symbolized by art objects, as can the Library of Congress Subject Headings (LCSH). The Union List of Artist Names (ULAN) and the Library of Congress Name Authority File (LCNAF) can provide preferred and variant names for portraits or self-portraits of artists, as well as for the creators of works of art and architecture.

Other useful vocabularies or term lists could be added to local authorities. Subjects that would be useful for many image collections might include non-Western iconography, Latin names of plants and animals, proper names of people who are not artists (for which the LCNAF would be a good source), events, actions, and abstract concepts (for example, emotions).

Conclusion: The Ultimate Goal Is Retrieval

Obviously, the reason for designing appropriate data structures and devoting considerable time and labor to indexing subjects in visual works is to provide good search and retrieval for the images being catalogued or indexed. Therefore, it is crucial to consider current and future retrieval needs of the particular institution and of its various types of users before beginning a cataloguing or indexing project. It is important to keep in mind that the system designed for cataloguing is unlikely to be the same system that will be used for retrieval by the public, so the data created in the editorial or cataloguing system must be exported or "published" to a second system. A certain level of retrieval is required even within a cataloguing system, however, so that cataloguers and their supervisors can check and organize their work. I think it is safe to say that if data is well organized and catalogued according to recognized standards and using the appropriate vocabularies, "re-purposing" it for various projects and migrating it to new systems in the future (which is inevitable) can be relatively routine tasks. People and institutions that are designing information systems should be aware that data can be compliant with multiple standards at the same time. Consulting a metadata standards crosswalk can aid in designing appropriate data structures and cataloguing rules so that data can be re-purposed and published in a variety of ways but recorded only once.10

In providing retrieval, it is important to remember that subjects are typically requested in combination with a variety of other elements, including the date or date span of the creation of a work, an artist's name, an artist's nationality, the medium or material of a work of art, and the type of object.11 Furthermore, multiple subjects may be requested at once. Finally, end-users can range from the general public to art historians and other experts. Information systems should allow versatile retrieval for various audiences with different needs and levels of experience.

If Subject Matter and other core metadata elements are well indexed, versatile retrieval is possible. If search is done on the iconographical theme "Adoration of the Magi," the results are those in figure 17. The search could then be narrowed by adding another criterion: for example, narrowing the results to only manuscript illuminations of this event—via the Object/Work-Type metadata element—would retrieve the last three images in the top row. If the objects have also been indexed by individual characters and elements of the scene and by broad themes, users could ask numerous questions. If a user asked to see all images of "Mary and Jesus," the images in the first and second rows would be among the results, including scenes of "Madonna and Child," the "Coronation of the Virgin," the "Pietà," and the "Crucifixion." If a user asked to see images of "mother and child," the last row would be added to the results.

Sample search results

As Colum Hourihane points out in the next essay, subject matter is one of the two main criteria end-users employ in searching for images of works of art. Careful consideration and application of standards and controlled vocabularies are critical to success in providing good end-user access to artworks via their subject matter.