Metadata Profile: Overview

overview

on the web

Dublin Core

RDF

PICS

PURLs

URNs

UDDI

thesauri

directories

engines

overview

This profile looks at metadata, the information that identifies web pages and thus forms the basis of some search engines, directories and content management systems.

The profile also considers directories, portals and search engines.

what's in this profile

The following pages provide more detailed information about -

on the web - the challenges of finding information online

the Dublin Core metadata suite (DC)

Resource Description Framework (RDF)

Platform for Internet Content Selection (PICS), the problematical proposal for identifying offensive material and other online content

Permanent Uniform Resource Locator (PURL) Scheme

Uniform Resource Numbering (URN), a proposal to identify documents on the web independent of their location, thus complementing the URLs.

Proposals for thesauri that would assist resource identification over the web as a whole or within particular sectors, such as the visual arts or biotechnology

Commercial and volunteeer Directories and portals covering the web as a whole or particular specialities - industries, geographical locations, areas of interest/expertise

Search engines

what is metadata?

Metadata is literally information about information. It may be very restricted in scope, such as a simple identification number. Or it may be descriptive, allowing the creation of indexes, lists and other tools that can be used for identification and for evaluation of information.

If you've used a library catalogue you've used such a tool. The catalogue is based on metadata - subject, author, publisher etc - about the books and other documents held by that institution.

Metadata is one of the key features of the web. It is found within individual web pages, at varying levels of detail and using varying standards, highlighted below. And it's found in the search engines, directories and other tools for finding sites and individual pages. The next page of this guide looks at those engines and directories.

This site, indeed, can be viewed as metadata about information on the web and offline, since it identifies and evaluates several thousand sites, web documents and print publications.

In the metrics guide on this site we highlight some of the studies about the growth of the web.

There are now many millions of sites and hundreds of millions of pages. Many of those documents change periodically (eg one study suggests that the 'half life' of a page is less than two years, roughly half the time it takes for most books to go out of print and one reason why many big sites - such as this one - have links that have "rotted"). And domain names don't reveal all the treasures (or lack of them) within a site. The size and volatility of the web means that it is beyond anyone to list the contents of all sites/pages and to provide an evaluation.

classification and its consequences

The importance of identification and evaluation - so that your customers can search in a particular part of the haystack rather than attempting to scrutinise every piece of straw - is discussed in Elaine Svenonius' The Intellectual Foundation of Information Organisation (Cambridge: MIT Press 00).

She offers a demanding but comprehensive introduction to the theory underlying attempts to identify, categorise and retrieve the resources in the 'global digital library', ie information accessed via the web.

There's a more accessible overview of identification/evaluation issues and that library in Christine Borgman's From Gutenberg to the Global Information Infrastructure: Access To Information in the Networked World (Cambridge: MIT Press 00). It is strongly recommended. Richard Belew's Finding Out About: Search Engine Technology From A Cognitive Perspective (Cambridge: Cambridge Uni Press 01) is a more theoretical study of search processes.

Among specialist and general journals we recommend the Journal of Internet Cataloging (JIC), D-LIB and the terribly earnest Information Trechnologies & Libraries (ITAL).

next page (on the web)