overview
on
the web
DC, AGLS
RDF
PICS
PURLs
URNs
UDDI
thesauri
directories
web engines
site engines
chronology |
Dublin Core, AGLS and other metadata sets
This
page looks at Dublin Core, one of a number of metadata
sets favoured by academic and library communities. It
also looks at AGLS - a DC-based Australian standard -
and other metadata sets such as LOM.
It covers -
introduction
As the preceding page of this profile suggested, there
is a bewildering and increasing variety of metadata sets
- reflecting user needs, institutional histories and personal
ambitions.
One of the major sets is Dublin Core, a metadata schema
that's been recognised in a number of national standards
(eg those of the US and Australia) and has been adopted
- or adapted - by major cultural institutions and government
agencies in some countries. One example is the Australian
Government Locator Service (AGLS) metadata set, a DC application
that is mandatory for the websites of federal government
agencies.
DC has not achieved wide acceptance within the wider web
community. Estimates of sites identified using DC range
from well under 1% to a maximum of 3%; that identification
is often restricted to the top level of sites rather than
embracing every digital object on the particular site.
DC is unlikely to become the global standard for most
content on the internet and intranets. It is however significant
as the 'lingua franca' for the exchange of data and for
cross-database searches of cultural material online. It
may also serve as a building block for construction of
the semantic web.
As we noted on the preceding page of this profile DC and
AGLS are not currently recognised by most search engines.
Why Dublin, why the Core?
In essence, the Dublin
Core (DC) Metadata Element Set is a suite of semantic
definitions of 15 descriptive elements, specifically intended
to support electronic resource discovery. The elements
represent a broad interdisciplinary consensus about the
core set of data elements that are likely to be generally
useful in supporting online resource discovery. DC does
not impose a controlled vocabulary. Instead it specifies
that descriptive information about the content or other
attributes of the entity being described - eg its author,
language and date of creation - can appear in particular
fields (the 'elements') in a particular format
The broadness of the specification means that the information
about the entity is platform-independent, in principle
facilitating the exchange of metadata between different
hardware/networks and searching across discrete databases.
Different methods can be used to record or transfer the
metadata, including HTML, XML, RDF and relational databases.
DC is independent of but complements the Resource Description
Framework (RDF), discussed on the following page of this
profile.
An introduction is provided by Carl Lagoze's D-Lib paper
on Keeping Dublin Core Simple: Cross-Domain Discovery
or Resource Description.
The
development of DC has been driven by the library and archives
communities. The name derives from the initial DC workshop,
held in Dublin Ohio under the auspices of the Online Computer
Library Center (OCLC).
That organisation is a US nonprofit serving the networking
needs of libraries in the US and some 70 other countries.
Dublin Core builds on RDF
by -
- defining
15 named elements - the Dublin Core Metadata Element
Set (DCMES) - that identify characteristics of an information
resource such as a web page and that are considered
to be widely useful for resource discovery
- specifying
that an information resource may be identified by any
number of each of those elements
- defining
the range of resource types that may have DC descriptions
- given by the allowed values for the DCMES Type element.
The
DCMES was accepted by the American National Standards
Institute as ANSI/NISO Z39.85-2001 (PDF)
in 2001.
Unqualified and Qualified DC
'Simple' or 'unqualified' Dublin Core is a term often
used to describe Dublin Core metadata that uses no qualifiers.
In unqualified DC the elements are expressed as attribute-value
pairs using only the 15 elements from the DC Metadata
Element Set
without further information about encoding schemes, enumerated
lists of values or other processing clues.
'Qualified' DC employs additional information to increase
the specificity of the metadata by refining the meaning,
specifying controlled vocabularies or encoding schemes
or indicating that a metadata value is a compound/structured
value. That supplementary description might include -
- values
from a controlled vocabulary (eg Dewey Decimal Classification
or Library of Congress Subject Headings)
- values
expressed using a special notation (eg the ISO 8601
format for dates and times)
- use
of a particular natural language (in the case of values
written in text-strings)
- an
instance of an element may be used in a specialised
way, with more restricted semantics than implied by
the broad definition of the element (eg metadata in
the Date element might be the date on which a resource
was modified)
To
encompass these requirements, the qualified Dublin Core
set also includes -
- value
qualifiers - an identifier for the vocabulary, encoding
or language of the value.
- element
qualifiers - refining the meaning of the element. A
DCMES qualified element qualifier is effectively a new
element, one with with a more specialised meaning than
its parent element.
Adoption
of qualifiers reflects political imperatives (ie reinforcing
support for DC by institutions that might otherwise maintain/develop
different schemas) and increases the specificity of the
metadata, thereby enhancing the precision of searches.
However, it introduces a complexity that can significantly
impede interoperability.
Dublin
Core interoperability qualifiers are a formal part of
the DC metadata registry. It is envisaged that local or
application-specific requirements may necessitate additional
qualifiers (or even additional elements) that do not reflect
a consensus within the overall DC community and thus do
not form part of that registry.
The primary elements are -
Title
- a name given to the resource, typically the name by
which it is formally known.
Creator - the entity primarily responsible
for making the content of the resource (eg the individual
author, organisation or service)
Subject & Keywords - topic of the
resource content, typically expressed as keywords, key
phrases or classification codes from a controlled vocabulary
or formal classification scheme
Description - an account of the content
of the resource, such as an abstract, table of contents
or a free-text description
Publisher - the entity responsible
for making the resource available (eg an individual,
organisation or service)
Contributor - an entity responsible
for making contributions to the content of the resource,
eg a person, organisation or service
Date - a date of an event in the lifecycle
of the resource, for example when it was created, published
or modified. DC recommended best practice is to use
the ISO 8601 standard, eg YYYY-MM-DD, and W3C Date &
Time Formats (W3CDTF)
Type - the nature or genre of the content
of the resource, typically a value from a controlled
vocabulary such as the DCMI Type Vocabulary
Format - the physical or digital manifestation
of the resource, of value in identifying hardware or
software needed to display or operate the resource
Identifier - the Resource Identifier,
an unambiguous reference to the resource within a given
context, such as the the Uniform Resource Identifier
(URI) , the Digital Object Identifier (DOI)
and the International Standard Book Number (ISBN) or
Serial Number (ISSN)
Source - a reference to a resource
from which the present resource is derived
Language - the language of the intellectual
content of the resource (expressed using the two- and
three-letter primary language tags in RFC 3066
and ISO 639)
Relation - a reference to a related
resource
Coverage - the extent or scope of the
content of the resource, typically, spatial location
(a place name or geographic coordinates), jurisdiction
(such as a named administrative entity) or temporal
period (a period label, date or date range), drawing
on a controlled geospatial vocabulary such as the Getty
Thesaurus of Geographic Names (TGN)
Rights - rights management information
about rights held in and over the resource, eg Intellectual
Property Rights.
tools
DC is not a feature of standard HTML page-authoring tools,
such as Dreamweaver, and most content management systems.
(Some AGLS tools are here.)
A number of automated DC generators are under development.
One example is DC-dot.
Typically those generators parse an existing HTML or XML
text, identifying key information that is then presented
in the DC elements for editing or immediate application
and publishing on an intranet or the internet.
Questions about manual DC creation are highlighted
in the paper
by Jane Greenberg & associates on Author-Generated
Dublin Core Metadata for Web Resources: A Baseline Study
in an Organization.
AGLS and AGIFT
The Australian Government Locator Service (AGLS)
Metadata Standard is an application of DC, with 19 descriptive
elements that encompass Australian government services
and functions.
It features an Australian Governments' Interactive
Functions Thesaurus (AGIFT),
important for harvesting data for central portals such
as Fed.gov.au,
Business Entry Point (BEP)
and GOLD.
It was developed in 1997-98 under the auspices of the
national Online Ministers Council (OC)
and Cultural Ministers Council (CMC)
by a federal working party that included Caslon's Bruce
Arnold. Development reflected the 1997 Information Management
Steering Committee report on The Management of Government
Information as a National Strategic Resource, a document
whose long-term impact is uncertain.
Use of AGLS on federal government sites was mandated
by the Better Services, Better Government 'e-Government
strategy' and there's been some acceptance by state/territory
government agencies, in particular as part of the Government
Electronic Resources Network (GOVERNET) project that aims
to facilitate online navigation, discovery and access
to government services and related information across
all government jurisdictions.
In practice adoption of AGLS has been uneven. The mandate
covers public sites, not corporate intranets. Responsibility
passed from the former Office of Government Online (OGO)
to the National Archives (NAA).
Studies of implementation by some major agencies suggest
that there are problems with initial quality control and
ongoing maintenance, inevitable given technical requirements,
costs
and agency perceptions about benefits and penalties.
AGLS was published as Australian Standard AS 5044 by Standards
Australia in December 2002.
It comprises five mandatory elements -
Creator
Title (alternative qualifier)
Date (created, modified, valid, issued qualifiers)
Subject & Keywords OR Function
Identifier OR Availability
plus
non-mandatory elements -
Publisher
Description
Source
Language
Relation (seven qualifiers)
Rights
Contributor
Coverage (temporal, spatial, jurisdiction, postcode
qualifiers)
Type (category, aggregationLevel, documentType, serviceType
qualifiers)
Format (extent, medium qualifiers)
Audience
Mandate (act, regulation, case qualifiers)
AGLS
is the Australian counterpart of the US Government Information
Locator Service (GILS).
ANZMETA
AGLS is independent of the ANZLIC Core Metadata Elements
set (ANZMETA),
the principle metadata standard for description of geospatial
data in Australia and New Zealand.
ANZMETA features a controlled vocabulary
specific to geospatial information and is used in the
Australian Spatial Data Directory (ASDD).
LOM
Learning Object Metadata (LOM) is a metadata standard
that is being developed under the auspices of the Institute
of Electrical & Electronics Engineers (IEEE), with
support from bodies such as the Alliance of Remote Instructional
& Distribution Networks for Europe (ARIADNE) and IMS
Global Learning Consortium.
The intention is to provide metadata to assist the identification,
evaluation and sharing of "learning objects"
(ie the content of education and training programs) in
"learning management systems" (aka courseware),
in particular to ensure that objects in one system are
understood readily in other systems.
Learning object metadata structures are key elements in
specifications that support the discovery, exchange, use
and repurposing of learning content. The expectation is
that applications of LOM will be used in systems maintained
by schools, libraries, publishers, major corporations,
government agencies and professional institutions for
training delivered
online (on the internet or via intranets) and in physical
format digital publications.
Current applications include CanCore,
the Canadian Core Learning Object Metadata Application
Profile, and SCORM,
the Sharable Content Object Reference Model (SCORM) -
a "web-based learning Content Aggregation Model and
Run-Time Environment for learning objects".
next page
(RDF)
|
|