Metadata, Directories & Search Engines Profile: Dublin Core

home | about | site use | services | guides | profiles | papers | timeline || Analysphere | Ketupa | Cinetext

overview

on the web

DC, AGLS

RDF

PICS

PURLs

URNs

UDDI

thesauri

directories

web engines

site engines

chronology

Dublin Core, AGLS and other metadata sets

This page looks at Dublin Core, one of a number of metadata sets favoured by academic and library communities. It also looks at AGLS - a DC-based Australian standard - and other metadata sets such as LOM.

It covers -

introduction
why Dublin, why the Core?
Unqualified and Qualified DC
DC tools
AGLS and AGIFT
ANZMETA
LOM

introduction

As the preceding page of this profile suggested, there is a bewildering and increasing variety of metadata sets - reflecting user needs, institutional histories and personal ambitions.

One of the major sets is Dublin Core, a metadata schema that's been recognised in a number of national standards (eg those of the US and Australia) and has been adopted - or adapted - by major cultural institutions and government agencies in some countries. One example is the Australian Government Locator Service (AGLS) metadata set, a DC application that is mandatory for the websites of federal government agencies.

DC has not achieved wide acceptance within the wider web community. Estimates of sites identified using DC range from well under 1% to a maximum of 3%; that identification is often restricted to the top level of sites rather than embracing every digital object on the particular site. DC is unlikely to become the global standard for most content on the internet and intranets. It is however significant as the 'lingua franca' for the exchange of data and for cross-database searches of cultural material online. It may also serve as a building block for construction of the semantic web.

As we noted on the preceding page of this profile DC and AGLS are not currently recognised by most search engines.

Why Dublin, why the Core?

In essence, the Dublin Core (DC) Metadata Element Set is a suite of semantic definitions of 15 descriptive elements, specifically intended to support electronic resource discovery. The elements represent a broad interdisciplinary consensus about the core set of data elements that are likely to be generally useful in supporting online resource discovery. DC does not impose a controlled vocabulary. Instead it specifies that descriptive information about the content or other attributes of the entity being described - eg its author, language and date of creation - can appear in particular fields (the 'elements') in a particular format

The broadness of the specification means that the information about the entity is platform-independent, in principle facilitating the exchange of metadata between different hardware/networks and searching across discrete databases. Different methods can be used to record or transfer the metadata, including HTML, XML, RDF and relational databases. DC is independent of but complements the Resource Description Framework (RDF), discussed on the following page of this profile.

An introduction is provided by Carl Lagoze's D-Lib paper on Keeping Dublin Core Simple: Cross-Domain Discovery or Resource Description.

The development of DC has been driven by the library and archives communities. The name derives from the initial DC workshop, held in Dublin Ohio under the auspices of the Online Computer Library Center (OCLC). That organisation is a US nonprofit serving the networking needs of libraries in the US and some 70 other countries.

Dublin Core builds on RDF by -

defining 15 named elements - the Dublin Core Metadata Element Set (DCMES) - that identify characteristics of an information resource such as a web page and that are considered to be widely useful for resource discovery
specifying that an information resource may be identified by any number of each of those elements
defining the range of resource types that may have DC descriptions - given by the allowed values for the DCMES Type element.

The DCMES was accepted by the American National Standards Institute as ANSI/NISO Z39.85-2001 (PDF) in 2001.

Unqualified and Qualified DC

'Simple' or 'unqualified' Dublin Core is a term often used to describe Dublin Core metadata that uses no qualifiers. In unqualified DC the elements are expressed as attribute-value pairs using only the 15 elements from the DC Metadata Element Set without further information about encoding schemes, enumerated lists of values or other processing clues.

'Qualified' DC employs additional information to increase the specificity of the metadata by refining the meaning, specifying controlled vocabularies or encoding schemes or indicating that a metadata value is a compound/structured value. That supplementary description might include -

values from a controlled vocabulary (eg Dewey Decimal Classification or Library of Congress Subject Headings)
values expressed using a special notation (eg the ISO 8601 format for dates and times)
use of a particular natural language (in the case of values written in text-strings)
an instance of an element may be used in a specialised way, with more restricted semantics than implied by the broad definition of the element (eg metadata in the Date element might be the date on which a resource was modified)

To encompass these requirements, the qualified Dublin Core set also includes -

value qualifiers - an identifier for the vocabulary, encoding or language of the value.
element qualifiers - refining the meaning of the element. A DCMES qualified element qualifier is effectively a new element, one with with a more specialised meaning than its parent element.

Adoption of qualifiers reflects political imperatives (ie reinforcing support for DC by institutions that might otherwise maintain/develop different schemas) and increases the specificity of the metadata, thereby enhancing the precision of searches. However, it introduces a complexity that can significantly impede interoperability.

Dublin Core interoperability qualifiers are a formal part of the DC metadata registry. It is envisaged that local or application-specific requirements may necessitate additional qualifiers (or even additional elements) that do not reflect a consensus within the overall DC community and thus do not form part of that registry.

The primary elements are -

Title - a name given to the resource, typically the name by which it is formally known.

Creator - the entity primarily responsible for making the content of the resource (eg the individual author, organisation or service)

Subject & Keywords - topic of the resource content, typically expressed as keywords, key phrases or classification codes from a controlled vocabulary or formal classification scheme

Description - an account of the content of the resource, such as an abstract, table of contents or a free-text description

Publisher - the entity responsible for making the resource available (eg an individual, organisation or service)

Contributor - an entity responsible for making contributions to the content of the resource, eg a person, organisation or service

Date - a date of an event in the lifecycle of the resource, for example when it was created, published or modified. DC recommended best practice is to use the ISO 8601 standard, eg YYYY-MM-DD, and W3C Date & Time Formats (W3CDTF)

Type - the nature or genre of the content of the resource, typically a value from a controlled vocabulary such as the DCMI Type Vocabulary

Format - the physical or digital manifestation of the resource, of value in identifying hardware or software needed to display or operate the resource

Identifier - the Resource Identifier, an unambiguous reference to the resource within a given context, such as the the Uniform Resource Identifier (URI) , the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN) or Serial Number (ISSN)

Source - a reference to a resource from which the present resource is derived

Language - the language of the intellectual content of the resource (expressed using the two- and three-letter primary language tags in RFC 3066 and ISO 639)

Relation - a reference to a related resource

Coverage - the extent or scope of the content of the resource, typically, spatial location (a place name or geographic coordinates), jurisdiction (such as a named administrative entity) or temporal period (a period label, date or date range), drawing on a controlled geospatial vocabulary such as the Getty Thesaurus of Geographic Names (TGN)

Rights - rights management information about rights held in and over the resource, eg Intellectual Property Rights.

tools

DC is not a feature of standard HTML page-authoring tools, such as Dreamweaver, and most content management systems. (Some AGLS tools are here.)

A number of automated DC generators are under development. One example is DC-dot. Typically those generators parse an existing HTML or XML text, identifying key information that is then presented in the DC elements for editing or immediate application and publishing on an intranet or the internet.

Questions about manual DC creation are highlighted in the paper by Jane Greenberg & associates on Author-Generated Dublin Core Metadata for Web Resources: A Baseline Study in an Organization.

AGLS and AGIFT

The Australian Government Locator Service (AGLS) Metadata Standard is an application of DC, with 19 descriptive elements that encompass Australian government services and functions.

It features an Australian Governments' Interactive Functions Thesaurus (AGIFT), important for harvesting data for central portals such as Fed.gov.au, Business Entry Point (BEP) and GOLD.

It was developed in 1997-98 under the auspices of the national Online Ministers Council (OC) and Cultural Ministers Council (CMC) by a federal working party that included Caslon's Bruce Arnold. Development reflected the 1997 Information Management Steering Committee report on The Management of Government Information as a National Strategic Resource, a document whose long-term impact is uncertain.

Use of AGLS on federal government sites was mandated by the Better Services, Better Government 'e-Government strategy' and there's been some acceptance by state/territory government agencies, in particular as part of the Government Electronic Resources Network (GOVERNET) project that aims to facilitate online navigation, discovery and access to government services and related information across all government jurisdictions.

In practice adoption of AGLS has been uneven. The mandate covers public sites, not corporate intranets. Responsibility passed from the former Office of Government Online (OGO) to the National Archives (NAA). Studies of implementation by some major agencies suggest that there are problems with initial quality control and ongoing maintenance, inevitable given technical requirements, costs and agency perceptions about benefits and penalties.

AGLS was published as Australian Standard AS 5044 by Standards Australia in December 2002.

It comprises five mandatory elements -

Creator
Title (alternative qualifier)
Date (created, modified, valid, issued qualifiers)
Subject & Keywords OR Function
Identifier OR Availability

plus non-mandatory elements -

Publisher
Description
Source
Language
Relation (seven qualifiers)
Rights
Contributor
Coverage (temporal, spatial, jurisdiction, postcode qualifiers)
Type (category, aggregationLevel, documentType, serviceType qualifiers)
Format (extent, medium qualifiers)
Audience
Mandate (act, regulation, case qualifiers)

AGLS is the Australian counterpart of the US Government Information Locator Service (GILS).

     ANZMETA

AGLS is independent of the ANZLIC Core Metadata Elements set (ANZMETA), the principle metadata standard for description of geospatial data in Australia and New Zealand.

ANZMETA features a controlled vocabulary specific to geospatial information and is used in the Australian Spatial Data Directory (ASDD).

     LOM

Learning Object Metadata (LOM) is a metadata standard that is being developed under the auspices of the Institute of Electrical & Electronics Engineers (IEEE), with support from bodies such as the Alliance of Remote Instructional & Distribution Networks for Europe (ARIADNE) and IMS Global Learning Consortium.

The intention is to provide metadata to assist the identification, evaluation and sharing of "learning objects" (ie the content of education and training programs) in "learning management systems" (aka courseware), in particular to ensure that objects in one system are understood readily in other systems.

Learning object metadata structures are key elements in specifications that support the discovery, exchange, use and repurposing of learning content. The expectation is that applications of LOM will be used in systems maintained by schools, libraries, publishers, major corporations, government agencies and professional institutions for training delivered online (on the internet or via intranets) and in physical format digital publications.

Current applications include CanCore, the Canadian Core Learning Object Metadata Application Profile, and SCORM, the Sharable Content Object Reference Model (SCORM) - a "web-based learning Content Aggregation Model and Run-Time Environment for learning objects".

   next page (RDF)