Introduction to Metadata: Glossary

algorithm A formula or procedure for solving a problem or carrying out a task. An algorithm is a set of steps in a very specific order, such as a mathematical formula or the instructions in a computer program. See also computer program.
Anglo-American Cataloguing Rules (AACR) A data content standard for describing bibliographic materials. http://www.aacr2.org/
application A software program designed to accomplish a task for an end user (e.g., word processing or project management), as distinguished from the operating system program that runs the computer itself.
application profile A set of metadata elements, policies, and guidelines defined for a particular application or community. The elements may be from one or more element sets, thus allowing a given application to meet its functional requirements by using metadata from several element sets, including locally defined elements.
application programming interface (API) A set of standardized requests defined by one computer program that allows another program to make requests and receive responses.
ASCII (American Standard Code for Information Interchange) A seven-bit character code defining 128 characters used for information interchange, data processing, and communications systems.
asymmetric relationship In the context of a thesaurus, a reciprocal relationship that is different in one direction than it is in the reverse—for example, BT/NT (for broader term/narrower term).
authentication A human or machine process that verifies that an individual, computer, or information object is who or what it purports to be.
authority file A file, typically electronic, that serves as a source of standardized forms of names, terms, titles, etc. Authority files should include references or links from variant forms to preferred forms. For example, in the Library of Congress Name Authority File, “Schiavone, Andrea” is the preferred name form for a Dalmatian artist active in Italy during the sixteenth century, while “Medulic´, Andrija,” “Lo Schiavone,” and several other forms are listed as variant names. Authority files regulate usage but also provide additional access points, thus increasing both the precision and recall of many searches.
authority heading A preferred, authorized heading used in a vocabulary, particularly in a bibliographic authority file that typically includes a string of names or terms, with additional information as necessary to allow disambiguation between identical headings (e.g., United States—History—Civil War, 1861–1865—Battlefields and United States—History—Civil War, 1861–1865—Campaigns). The types of authority headings used by the Library of Congress are the following: subject, name, title, name/title, and keyword.
automatic indexing In the context of online retrieval, indexing by the analysis of text or other content using computer algorithms. The focus is on automatic, behind-the-scenes methods involving little or no input from individual searchers, with the exception of relevance feedback.
back-end database A database that contains and manages data for an information system, distinct from the presentation or interface components of that system.
batch load In the context of populating or contributing to databases, moving or manipulating a group of records as a single unit for the purpose of data processing, typically accomplished by the computer without user interaction, in contrast to entering records manually, one at a time. See also load and processing.
BIBFRAME (Bibliographic Framework) A data model for bibliographic description designed to replace the MARC standards and to use the principles of linked data to make bibliographic data more useful within the library community as well as in the broader universe of information. http://www.loc.gov/bibframe/
Boolean operators Logical operators used as modifiers to refine the relationship between terms in a search. The four most commonly used Boolean operators are AND, OR, NOT, and ADJ (adjacent). They may be used with parentheses and other punctuation to form logical groupings of criteria in queries—e.g., (Castillo OR Rancho) AND Diego.
browsing The process whereby a user of a system or web site visually scans and maneuvers through navigation lists, results lists, hierarchical displays, or other content in order to make a selection, as contrasted to the user entering a search term in a search box. See also searching.
cataloger In the context of this book, the person who enters information in records for works. See also end user.
cataloging In the context of this book, the process of describing and indexing a work or image, particularly in a collections management system or other automated system. Cataloging involves the use of prescribed categories of information and rules—e.g., the rules described in AACR2, RDA, CCO, and CDWA.
Cataloging Cultural Objects (CCO) A data content standard for describing works of art, architecture, and material culture. http://cco.vrafoundation.org/
CDWA (Categories for the Description of Works of Art) Lite An XML schema for core records for art, architecture, and material culture designed to work with the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH); the elements are based on a subset of the full element set of Categories for the Description of Works of Art. http://www.getty.edu/research/publications/electronic_publications/cdwa/cdwalite.html
CGI script A computer program, most frequently written in C, Perl, or a shell script, that uses the Common Gateway Interface (CGI) standard and provides an interface between a user or an external computer application and a web server. CGI scripts are most commonly used to develop forms that allow users to submit information to a web server.
CIDOC CRM (CIDOC Conceptual Reference Model) An object-oriented model for the publication and interchange of cultural heritage information. http://www.cidoc-crm.org
classification In the context of this book, the process of arranging works or other content objects systematically in groups or categories of shared similarity according to established criteria and using terms to identify the classes.
client An application or piece of hardware that retrieves and/or renders resources or resource manifestations. Often used to denote a computer or other kinds of devices connected to a network equipped with software that enables users to access resources available on another computer connected to the same network, called a server.
clustering In the context of automated data, clustering usually refers to the process of grouping or classifying items or data through automatic or algorithmic means rather than by incorporating human judgment.
collection management system A type of database system that allows an institution to manage various aspects of its collections, including description (artist, title, measurements, media, style, subject, etc.) as well as administrative information regarding acquisitions, loans, and conservation information.
computer program Also called a program. A specific set of instructions for ordered operations that result in the completion of a task by the computer; a computer program consists of computer code. While the program is technically a type of data, computer programs are generally considered as separate from the data to which they refer (e.g., data would be the terms, scope notes, etc., in a vocabulary record). An interactive program acts when prompted by an action or information supplied by a user; a batch program automatically runs at a certain time or under certain conditions and then stops after the task is completed. A program is written in a programming language. See also processing.
computer system See system
conceptual data model An abstract model or representation of data for a particular domain, business enterprise, or field of study, independent of any specific software or information system. Usually expressed in terms of entities and relationships. See also logical data model.
content object In the context of a database, any entity that contains data. A content object can itself be made up of content objects. For example, a journal is a content object made up of individual journal articles, which are themselves content objects. See also information object.
controlled vocabulary An organized arrangement of words and phrases used to index content and/or to retrieve content through browsing or searching. A controlled vocabulary typically includes preferred and variant terms and has a limited scope or describes a specific domain.
core elements In the context of this book, the set of metadata elements representing the fundamental or most important information required for a minimal record. See also required fields.
cross-database searching See federated searching
crosswalk Also called field mapping. A chart or table (visual or virtual) that represents the semantic or technical mapping of fields or data elements from one data standard to fields or data elements in another standard that has a similar function or meaning. Crosswalks make it possible to convert data between databases that use different metadata schemes and enable heterogeneous databases to be searched simultaneously with a single query as if they were a single database (semantic interoperability). See also metadata mapping.
data In common usage in computer science, this term is used as a singular noun to refer to information that exists in a form that may be used by a computer, excluding the program code. In other uses, datum is the singular and data is the plural, referring to facts or numbers in a general sense.
database A structured set of data held in computer storage, especially one that incorporates software to make it accessible in a variety of ways. A database is used to store, query, and retrieve information. It typically comprises a logical collection of interrelated information that is managed as a unit, stored in machine-readable form, and organized and structured as records that are presented in a standardized format in order to allow rapid search and retrieval by a computer. See also system.
database field Also called a data field. A placeholder for a unit of information in a database that forms one of the searchable items in that database. A database field is a portion of a structured machine-readable record containing a particular category of information (e.g., term and scope note would be fields included in a vocabulary record).
database index Also called a data index. A particular type of data structure that improves the speed of operations in a table by allowing the quick location of particular records based on key column values. Indexes are essential for good database performance. The concept is distinguished from human indexing (application of keywords and other data values to a descriptive record) and automatic indexing.
database record See record
data content standard Rules that determine the vocabulary, syntax, or format of content entered into data fields or metadata elements—e.g., RDA, ISO 8601 (rules for recording date and time), DACS, CCO.
data preprocessing See preprocessing
data processing See processing
data provider In Open Archives Initiative nomenclature, an organization that exposes metadata records in one or more repositories (specially configured servers) for harvesting by service providers.
data structure A given organization of data, particularly data elements, logical relationships between metadata elements, and storage allocations for the data.
data table / database table Sets of related data elements that are organized in a grid or matrix comprising rows and columns in a database.
data values The terms, words, or numbers used to populate fields in a record.
deep web See hidden web
default values Values that are assumed or supplied automatically (for example, by a computer system) if a value is not specified.
Describing Archives: a Content Standard (DACS) A data content standard for describing archival collections. http://files.archivists.org/pubs/DACS2E-2013_v0315.pdf
diacritics Also called diacritical marks. Signs or accent marks found over, under, or through alphabetic letters in many languages (e.g., the umlaut in German, München), used to indicate emphasis or pronunciation, often to distinguish different sounds or values of the same letter or character without the diacritical mark.
digital asset management system A type of system for organizing digital media assets, such as digital images or video clips, for storage and retrieval. Digital asset management systems sometimes incorporate a descriptive data cataloging component, but they tend to focus on managing workflow for creating digital assets and for managing asset rights, requests, and permissions.
digital signatures A form of electronic authentication of a digital document. Digital signatures are created and verified using public key cryptography and serve to tie the document being signed to the signer.
digital surrogate A digital “copy” of an original work or item (e.g., a JPEG or TIFF image of a painting or sculpture, or a PDF file of an article or book). In Open Archives Initiative nomenclature, digital surrogates are often referred to as “resources.”
document In the context of search and retrieval, the combination of a defined, primarily self-contained, machine-readable text and the format in which it is expressed.
domain name The address that identifies an Internet or other network site. On the Internet, domain names act as mnemonic aliases for IP addresses, a hierarchical numeric addressing system that enables Internet hosts to be uniquely identified. The hierarchical nature of the Domain Name System means that the authority for issuing subdomain names is delegated down the hierarchy; for example, once the Getty Trust has registered the domain name “getty.edu,” it is responsible for any subdomain names such as “www.getty.edu,” “shiva.getty.edu,” etc.
Dublin Core Metadata Element Set A set of fifteen metadata elements optimized for resource discovery on the web that can be assigned to information resources. Also often used as a “lowest common denominator” in metadata mapping. http://dublincore.org/documents/dces/
dynamically generated Refers to a web page, metadata record, or other information object that is generated on demand, typically from content stored in a database and usually either in response to a user’s input or from dynamic data sources that are refreshed periodically. The expression “on the fly” is often used in relation to dynamically generated content.
Encoded Archival Description (EAD) A data structure standard for encoding archival finding aids in SGML or XML according to the EAD document type definition (DTD) or XML schema that makes it possible for the semantic contents of a finding aid to be machine processed. http://www.loc.gov/ead/
encryption An encoding mechanism used to prevent unauthorized users from reading digital information and also for user and document authentication. Only designated users or recipients have the capability to decode encrypted materials.
end user In the context of systems design, the term refers to any client for whom a database system is designed and operated; from that perspective, it could include the editors or catalogers for whom an editorial or cataloging system has been designed.
entity relationship model A type of conceptual data model that represents structured data in terms of entities and relationships. An entity relationship diagram can be used to visually represent information objects and their relationships. Because the constructs used in the entity relationship model can easily be transformed into relational tables, this type of model is often used in database design.
Exif (Exchangeable Image File Format) A specification for an image file format for digital cameras that provides the ability to attach image metadata to JPEG, TIFF, and RIFF images. As of this writing, Exif is not maintained by any industry or standards organization but is widely used by camera manufacturers. http://www.cipa.jp/std/documents/e/DC-008-2012_E.pdf
false hit In search and retrieval, an entry in a list of results that does not comply with the user’s intended results. Also called a false drop.
federated searching Also called, cross-database searching, metasearching, and parallel searching. Performing queries simultaneously across resources residing in different domains and created by different communities. Federated searching may involve searching across multiple databases, different platforms, and varying protocols, thus requiring the application of interoperability between resources and vocabularies.
field mapping See crosswalk
finding aid A descriptive tool widely used in archives. Finding aids typically take the form of hierarchical narrative descriptions of cohesive groups of archival records or collections of manuscript materials. Finding aids traditionally were paper documents; Encoded Archival Description (EAD) is a structured way of expressing finding aids as machine-readable data.
FOAF (Friend of a Friend) A machine-readable ontology that models data for persons, their activities, and their relationships to other people and objects. http://www.foaf-project.org/
folksonomy An assemblage of concepts, represented by terms and names (called “tags”), that result from social tagging. A folksonomy differs from a taxonomy in that it is not structured hierarchically. The authors of the folksonomy are typically the casual users of the content rather than professional indexers following standard protocols and using standardized controlled vocabularies.
FRBRoo A joint initiative of the International Federation of Library Associations and Institutions (IFLA) and the International Council of Museums–International Documentation Committee (ICOM-CIDOC) to create an object-oriented ontology that both captures the semantics of bibliographic information and harmonizes those concepts in common with the CIDOC CRM, thus facilitating information interchange between the museum and library communities. http://cidoc.ics.forth.gr/frbr_inro.html
free-text field A field that may contain data entered without any system-defined structure. It may be used to express ambiguity, uncertainty, and nuance in a note.
FTP (File Transfer Protocol) A TCP/IP protocol that allows data files to be copied directly from one computer to another over the Internet.
Functional Requirements for Bibliographic Records (FRBR) A set of requirements and a conceptual entity relationship model developed by the International Federation of Library Associations and Institutions to support bibliographic access and control. http://www.ifla.org/publications/functional-requirements-for-bibliographic-records
Google Sitemap Metadata about the content of a web site that assists the Googlebot web crawler to index a site more efficiently and comprehensively.
granular, granularity The level of detail at which an information object or resource is viewed or described.
harvester In Open Archives Initiative nomenclature, a computer system that sends Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) requests to data providers’ repositories and harvests metadata records from them.
heading Also called a label. A string of words comprising a term combined with other information that serves to modify, disambiguate, amplify, or create a context for the main term in displays. See also authority heading.
hidden web Also called deep web and invisible web. The sum of the web pages that are not accessible to web crawlers, usually because they are either dynamically generated by a user querying a database or are password protected or subscription based.
hostname An identifier for a specific machine on the Internet. The hostname identifies not only the machine, but also its subnet and domain—for example, “www.getty.edu.” See also domain name.
HTML (HyperText Markup Language) An SGML-derived markup language used to create documents for World Wide Web applications. HTML has evolved to emphasize design and appearance rather than the representation of document structure and metadata elements.
HTTP (HyperText Transfer Protocol) The standard protocol that enables users with web browsers to access HTML documents and related media.
hyperlink An abbreviated reference to a “hypertext link,” a method of creating nonlinear pathways between related digital documents or to link to related objects such as image or audio files.
information object A digital item or group of items referred to as a unit, regardless of type or format, that a computer can address or manipulate as a single discrete object. See also content object.
International Organization for Standardization (ISO) A worldwide voluntary network of national standards institutes from approximately 160 countries. The standards bodies work in partnership with international organizations, governments, industries, businesses, and consumer representatives to reach consensus, set standards, and promote their use with the goal of facilitating trade and meeting the broader needs of society.
Internet A global collection of computer networks that exchange information by the TCP/IP suite of networking protocols.
Internet directory A thematically organized list of descriptive links to Internet sites, often created by humans who have classified sites by their content. Best of the Web (http://botw.org) provides such directories.
interoperability The ability of different information systems to work together, particularly in the correct interpretation of data semantics and functionality. See also semantic interoperability.
invisible web See hidden web
item In the context of cataloging art, an individual object or work.
jargon A characteristic terminology of a particular group or discipline that is typically not understood by a more general audience.
keyword Any significant word or phrase in the title, subject headings, or text associated with an information object.
keyword in context (KWIC) A type of automatic indexing in which each word in a text, title, subject heading, string of words, or term becomes an entry word in the index, with the exception of words in stop lists. Variations include KWOCs (keyword out of context) and KWACs (keyword alongside context).
keyword index An index based on individual keywords found in a controlled vocabulary, text, or other content object.
language model A type of automatic indexing based on term weighting and relevance prediction that attempts to predict probable query search terms based on term frequencies within documents and the inverse document frequency of terms across the target data. It is similar to the probabilistic model.
legacy system An information system that has been developed and modified over a period of time and has become outdated and difficult and costly to maintain but that holds information that is important and involves processes that are deeply ingrained in an organization. Legacy systems usually are eventually replaced by new hardware and software configurations.
LIDO (Lightweight Information Describing Objects) A simple XML schema for describing and interchanging core information about museum objects. http://network.icom.museum/cidoc/working-groups/lido/
linked data Data that is semantically linked by following a set of best practices for publishing and interlinking structured data that uses RDF syntaxes and HTTP URIs.
linked open data (LOD) Linked data that is made available for use, reuse, and redistribution on the visible web.
link resolver Software that uses the OpenURL standard to automatically redirect a user’s request to the most appropriate copy of a networked digital object. Typically, link resolvers are used by libraries to direct their patrons from bibliographic records or abstracts to licensed subscription-based resources such as full-text electronic versions of articles, books, etc. http://www.niso.org/apps/group_public/project/details.php?project_id=115
load The process of moving or transferring files or software from one disk, computer, or server to another. To upload means to transfer from a local computer to a remote computer; to download means to transfer from a remote computer to a local one.
logical data model A data model that includes all entities and the relationships among them based on the structures identified in a conceptual data model and that specifies all attributes for each entity. The data is described in as much detail as possible, without regard to how it will be physically implemented in a specific database.
mapping A set of correspondences between terms, fields, or element names used for translating data from one standard or vocabulary into another, or as a means of combining terms or data for search and retrieval. See also crosswalk.
MARC (Machine-Readable Cataloging) format A set of standardized data structures for describing bibliographic materials that facilitates cooperative cataloging and data exchange in bibliographic information systems. http://www.loc.gov/marc/
markup language A formal way of annotating a document or collection of digital data using embedded encoding tags to indicate the structure of the document or data file and the contents of its data elements. It also provides a computer with information about how to process and display marked-up documents. HTML, XML, and SGML are examples of standardized markup languages.
memory institution A generic term used to describe an institution that has a responsibility to collect, care for, and provide access to the human record—for example, museums, libraries, and archives.
Metadata Encoding Transmission Schema (METS) A standard for encoding descriptive, administrative, and structural metadata relating to objects in a digital library, expressed in XML. METS enables the “packaging” of complex digital objects that include a range of metadata as well as related digital surrogates. http://www.loc.gov/standards/mets/
metadata mapping A formal identification of equivalent or nearly equivalent metadata elements or groups of metadata elements within different metadata schemas, carried out in order to facilitate semantic interoperability. See also mapping and crosswalk.
metadata mining The automated extraction of metadata from electronic documents.
Metadata Object Description Schema (MODS) An XML schema for bibliographic records, developed and maintained by the Library of Congress. http://www.loc.gov/standards/mods/
metasearch Searching of diverse databases on diverse platforms with diverse metadata in real time via one or more protocols. The National Information Standards Organization MetaSearch Initiative defines metasearch as “search and retrieval to span multiple databases, sources, platforms, protocols, and vendors at once.” Metasearch enables users to enter search criteria once and access several search engines simultaneously. With metasearch, fresh records are always available because searching is in real time, in a distributed environment.
meta tag An HTML tag that enables metadata to be embedded invisibly on web pages (e.g., description, keywords).
meta tag spamming The deliberate misuse of meta tags in order to attract traffic to a site (i.e., by boosting its ranking in search results).
namespace The set of unique names used to identify objects within a well-defined domain, particularly relevant for XML, LOD, and DNS applications.
National Information Standards Organization (NISO) A nonprofit association that is accredited by the American National Standards Institute and identifies, develops, maintains, and publishes technical standards to manage information.
nesting The way in which subelements may be contained within larger elements, resulting in multiple levels of metadata.
object-oriented programming A programming model organized around objects rather than actions and data rather than logic, where an object is a location that has a value and is referenced by an identifier.
Online Public Access Catalog (OPAC) A computerized inventory of a library’s holdings.
ontology In the context of this book, an ontology is a formal, machine-readable specification of a conceptual model in which concepts, properties, relationships, functions, constraints, and axioms are all explicitly defined. While an ontology is not technically a controlled vocabulary, it uses one or more controlled vocabularies for a defined domain. Identifying an existing ontology, or developing an appropriate ontology, is the first step in expressing data as linked open data (LOD).
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) A protocol used to harvest or collect metadata records from data providers. See also data provider, data harvester, and service provider. http://www.openarchives.org/pmh/
operating system A software program that runs a computer, as distinguished from an application, which is installed into an operating system in order to enable users to perform specific tasks.
PageRank™ A proprietary link-analysis algorithm developed by Google founders Larry Page and Sergey Brin to assign a numerical score to each document in a set of hypertext documents based on the number of referring links. The algorithm also takes into account the rank of the referring page; thus a link from a high-ranking page counts more than a link from a low-ranking page.
paradigmatic relationship Also called a semantic relationship. A relationship between terms or concepts that is permanent and based on a known definition.
parsing In processing data, a process by which data is broken or filtered into smaller, more distinct units.
precision In the context of this book, a measure of search effectiveness expressed as the ratio of relevant records or documents retrieved from a database to the total number retrieved in response to the query. A high-precision search means that most of the results retrieved will be relevant; however, a high-precision search will not necessarily retrieve all relevant results. Recall and precision are inversely related (when one goes up, the other goes down).
preprocessing Also called data preprocessing. Preliminary processing or transformation of data in order to facilitate further processing, parsing, etc.
probabilistic model An automatic relevance and weighting method in which terms in a text or other content object are modeled as random variables so that term frequency and distribution are used to predict the probability of relevance. See also language model.
procedure A relatively independent portion of computer code within a larger computer program that performs a specific task in a series of steps. Also called a subprogram or subroutine.
processing Also called data processing. The manipulation or transformation of data through a series of operations. In batch processing, the operations are grouped together in batches and performed automatically; in >interactive processing, the operations are prompted by input from a human programmer or user. See also computer program.
program See computer program
programming language A formal language defined by syntactic and semantic rules and used to write instructions that can be translated into machine language and then executed by a computer (e.g., PL/SQL, C++, C#, Java, Perl, Ruby, Python, BASIC).
protocol A specification—often a standard—that describes how computers communicate with each other (e.g., the TCP/IP suite of communication protocols or Open Archives Initiative Protocol for Metadata Harvesting [OAI-PMH]).
query In the context of retrieval, a command to look in a database and find records or other information that meet a specified set of criteria. The most precise queries are those that return the fewest false hits.
query expansion Reformulating a query in order to return a broader or more comprehensive set of results (e.g., adding synonyms to a search term).
recall A measure of a search system’s effectiveness in terms of retrieving all results that are possibly relevant, expressed as the ratio of the number of relevant records or documents retrieved over all the relevant records or documents. A high recall search retrieves a comprehensive set of relevant results; however, it also increases the likelihood that marginally relevant content objects will also be retrieved. Recall and precision are inversely related (when one goes up, the other goes down).
record In the context of this book, a coherent, discrete group of populated fields or metadata elements. Also called a logical record.
relational database A database organized on a relational model that organizes data into one or more tables of rows and columns with a unique key for each row. The rows in a table can be linked to rows in other tables by storing the unique key of the row to which it should be linked.
relationship In the context of this book, a link between two types of data, records, files, or any two entities of the same or different types in a system or network.
relevance ranking Ranking and sorting of query results, typically estimated by an algorithm that calculates the number and weight of occurrences of the search term in the targeted data. Relevance ranking frequently does not correspond to the actual relevance of the information retrieved in a search for the user’s information-seeking needs.
required fields Data fields or metadata elements that are required to meet a standard or the requirements of a system’s operations.
Resource Description and Access (RDA) The cataloging standard for libraries that, as of this writing, has begun to replace AACR2. http://www.rdatoolkit.org/
Resource Description Framework (RDF) A standard model for data interchange on the web that extends the linking structure of the web to use URIs to name relationships between things. RDF enables structured and semistructured data to be exposed and shared across different applications. http://www.w3.org/RDF/
resource discovery The process of searching for specific information objects on the web.
retrieval In the context of this book, the activity of using a search or other method to find records or other data in a database. See also query.
robot See web crawler
schema Also called scheme. The organization, structure, and rules for encoding information that supports specific communities of users. The plural forms of the word schema are “schemas” and “schemata.” See also XML schema.
schema registry An authoritative source of names, semantics, and syntaxes for one or more schemas.
screen scraping A technique in which display data (usually unstructured) is automatically retrieved and extracted, for example from a web page.
search engine A computer program that allows users to search electronic resources. In the context of the World Wide Web, the term usually refers to a program that searches a large index of web pages generated by an automated web crawler. See also web search engine.
searching Operations or algorithms intended to determine if one or more data items meet defined criteria or possess a specified property.
semantic interoperability The ability of different agents, services, and applications to communicate data while ensuring accuracy and preserving the meaning of the data.
semantic linking A method of linking terms in a database according to the meaning of and relationships between terms.
Semantic Web An evolving, collaborative effort led by the World Wide Web Consortium (W3C) whose goal is to provide a common framework that will allow data to be shared and reused across various applications and enterprise and community boundaries. It derives from W3C director and inventor of the World Wide Web Tim Berners-Lee’s vision of the web as a universal medium for data, information, and knowledge exchange.
server An application that supplies resources or resource manifestations. Often used to refer to a networked computer that acts as a source of data and/or applications used by multiple client computers or devices. See also client.
service provider In Open Archives Initiative nomenclature, an institution or organization that harvests metadata from data providers and uses the aggregated metadata as a basis for building value-added services.
Simple Knowledge Organization System (SKOS) An endeavor of the World Wide Web Consortium that develops specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading lists, and taxonomies within the framework of the Semantic Web. http://www.w3.org/2004/02/skos/
social bookmarking The decentralized practice and method by which individuals and groups create, classify, store, discover, and share web bookmarks or “favorites” in an online “social” environment.
social tagging The decentralized practice and method by which individuals and groups create, manage, and share terms, names, etc.—called “tags”—to annotate and categorize digital resources in an online “social” environment. A folksonomy is the result of social tagging. Also referred to as collaborative tagging, social classification, social indexing, mob indexing, folk categorization. See also tagging.
sorting The automated process of organizing a results list, data elements in a record, or other data in a particular sequence based on established criteria or attributes of the data—for example, alphabetically, by parent string, or by an associated date. There may be primary sort criteria and secondary sort criteria (e.g., an algorithm can be formulated to first sort place names in a results list alphabetically and then to sort by the parent string).
spamming Used in reference to meta tags, the abuse of metadata that web page creators include in the HTML header area of their pages in order to increase the number of visitors to a web site. Keyword spamming entails repeating keywords multiple times in order to appear at the top of search engine result listings or listing keywords that are irrelevant to the site in order to attract visitors under false pretenses.
specifications In the context of designing an information system, the formal, detailed description of user and technical requirements, including specific descriptions of procedures, functions, screens, reports, materials, other features, and hardware. See also user requirements.
spider See web crawler
SQL (Structured Query Language) A special-purpose command language used with relational databases to perform queries and other tasks.
SRU/SRW (Search and Retrieve via URL/Search and Retrieve Web Service) Companion protocols for web search queries utilizing the CQL Common Query Language. http://www.loc.gov/standards/sru/
stop list In the context of search and retrieval, words in a vocabulary or target data that are ignored in searching or matching because they occur too frequently or are otherwise of little value in retrieval for a given domain. Common stop lists for a text contain articles, conjunctions, and prepositions, although these words are typically not included in a stop list for a vocabulary.
surrogate See digital surrogate
system Also called a computer system. A number of interrelated hardware and software components that work together to store and convert data into information by using electronic processing. See also database.
tagging In the context of the web, the act of associating terms (called “tags”) with an information object (e.g., a web page, an image, a streaming video clip), thus describing the item and enabling keyword-based classification and retrieval. Tags—a form of user-generated metadata—from communities of users can be aggregated and analyzed, providing useful information about the collection of objects with which the tags have been associated. See also social tagging.
taxonomy An orderly classification that explicitly expresses the relationships, usually hierarchical (e.g., genus/species, whole/part, class/instance)—between and among the things being classified. A taxonomy can be used as a controlled vocabulary. See also folksonomy.
TCP/IP (Transmission Control Protocol/ Internet Protocol) The International Organization for Standardization (ISO) standardized suite of network protocols that enables information systems to communicate with other information systems on the Internet regardless of their computer platforms.
Text Encoding Initiative (TEI) An international cooperative effort to develop guidelines for standard encoding schemes—i.e., the TEI and TEI Lite document type definitions (DTDs)—for literary and linguistic texts. http://www.tei-c.org/
transliteration The process of rendering the letters or characters of one alphabet or writing system into the corresponding letters or characters of another alphabet or writing system, generally based on phonetic equivalencies. While a common noun will often be translated, a proper name in a non-Roman alphabet is more often transliterated. There are often multiple standards for transliterating from one writing system to another, thus producing multiple variant names.
truncation In searching and matching, the action of cutting off characters in a search term in order to find all terms with a certain common string of characters; this typically involves the user employing a wildcard symbol to search for a string of characters no matter what other characters follow (or precede) that string (e.g., searching for arch* will retrieve arch, arches, architrave, architecture, architectural history, etc.).
Unicode A sixteen-bit character-encoding scheme and standard for representing letters, characters, and diacritical marks in most of the world’s modern scripts. http://unicode.org/
unique identifier A number or other string that is associated with a record or piece of data, exists only once in a database, and is used to uniquely identify and disambiguate that record or piece of data from all others in the database.
URI (Uniform Resource Identifier) A short string that uniquely identifies a resource such as an HTML document, an image, a downloadable file, or a service. URLs and URNs are types of URIs.
URL (Uniform Resource Locator) A type of URI consisting of an Internet address that tells users how and where to locate a specific file on the World Wide Web. A URL includes not only the name of a file, but also the name of the host computer, the directory path to get to that file, and the protocol needed in order to use it (e.g., http://www.getty.edu/research/publications/electronic_publications/index.html specifies that the hypertext transfer protocol “http” should be used to retrieve the document “index.html” from the host “www.getty.edu” in the directory “research/publications/electronic_publications/index.html.”)
URN (Uniform Resource Name) A type of URI consisting of a unique, location-independent identifier of a file available on the Internet. The file remains accessible by its URN regardless of changes that might occur in its host and directory path. For example, urn:issn:0167-6423 is the URN for the journal Science of Computer Programming.
user interface The portion of the design and functionality of a cataloging, editorial, search and retrieval, or other system or web site with which end users interact, including the arrangement of displays, menus, clickable text or images, pagination, etc. A user interface that is easy for users to utilize is called user friendly.
user requirements In system design, the initial formal explanation of functionalities, displays, and reports expressed from the point of view of the user’s needs and expectations. See also specifications.
Virtual International Authority File (VIAF) A federated resource that provides integrated access to millions of records from authority files compiled by libraries and other memory institutions from around the world. http://www.viaf.org
visible web The subset of the World Wide Web that is visible to web browsers and indexable by search engines’ web crawlers. In order to be accessible to web crawlers, the pages must be accessible simply by following links (i.e., not generated dynamically in response to user input) and not protected by a password.
VRA Core 4.0 An XML schema developed by the Visual Resources Association (VRA) and supported by the Library of Congress, VRA Core is used for describing works of art and architecture and their visual surrogates. http://www.loc.gov/standards/vracore/schemas.html
web browser A software application that enables users to view and interact with information and media files on the web. Mozilla Firefox, Google Chrome, and Apple’s Safari are examples of web browsers.
web crawler A software program that systematically traverses the web, either for the purpose of generating a searchable index of web content or to gather statistics. See also robot and spider.
web search engine / Internet search engine A software program that collects data taken from the content of files available on the web and puts them in an index or database that web users can search in a variety of ways. The search results provide links back to the pages matching the user’s search in their original location.
web server A computer that is able to respond to HTTP requests from clients known as web browsers and return the appropriate HTTP responses—most typically serving an HTML page.
website A collection of related electronic pages (web pages), generally formatted in HTML and found at a single address where the server computer is identified by a given host name.
wiki A collaborative website that contains pages that any authorized user can edit. Wikis typically retain all former versions of each page, allowing the revision history of a page to be tracked and for unwanted revisions to be reversed.
Wikipedia A free, collaborative, volunteer-driven, web-based encyclopedia that utilizes wiki software to allow anyone to edit articles. http://en.wikipedia.org/wiki/
World Wide Web A vast, distributed wide-area client-server architecture for retrieving hypermedia documents over the Internet.
World Wide Web Consortium (W3C) The main international standards organization for the World Wide Web.
XML (Extensible Markup Language) A relatively simple, flexible markup language used for publication and exchange of a wide variety of data on the web.
XML schema A machine-readable definition of the structure, elements, and attributes allowed in a valid instance of a conforming XML document. XML schemas are expressed using the XML Schema Definition language, a World Wide Web Consortium (W3C) standard. http://www.w3.org/TR/xmlschema-0
XMP (Extensible Metadata Platform) A markup language, based on the Resource Description Framework (RDF), for recording and embedding metadata about digital assets. Developed by Adobe Systems and supported across the company’s range of software products and file formats. http://www.adobe.com/products/xmp.html
Z39.50 A client/server-based protocol for searching and retrieving information from remote databases.