overview
on the web
Dublin Core
RDF
PICS
PURLs
URNs
|
overview
This profile looks at
metadata, the information that identifies web pages and thus forms the
basis of some search engines, directories and content management
systems.
Some businesses claim that they can get you to the top of
the major search engines by using special metadata. Sadly, it just ain't
so. Here's why.
what is metadata?
Metadata is literally information about
information. It may be very restricted in scope, such as a simple
identification number. Or it may be descriptive, allowing the creation
of indexes, lists and other tools that can be used for identification
and for evaluation of information.
If you've used a library catalogue
you've used such a tool. The catalogue is based on metadata - subject,
author, publisher etc - about the books and other documents held by that
institution.
Metadata is one of the key features of
the web. It is found within individual web pages, at varying levels of
detail and using varying standards, highlighted below. And it's found in
the search engines, directories and other tools for finding sites and
individual pages. The next page of this guide looks at those engines and
directories.
This site, indeed, can be viewed as
metadata about information on the web and offline, since it identifies
and evaluates several thousand sites, web documents and print
publications.
In the metrics
guide on this site we highlight some of the studies about the growth of
the web.
There are now many millions of sites and hundreds of millions
of pages. Many of those documents change periodically (eg one study
suggests that the 'half life' of a page is less than two years, roughly
half the time it takes for most books to go out of print and one reason
why many big sites - such as this one - have links that have
"rotted"). And domain names don't reveal all the treasures (or
lack of them) within a site. The size and volatility of the web means
that it is beyond anyone to list the contents of all sites/pages and to
provide an evaluation.
classification and
its consequences
The importance of identification and
evaluation - so that your customers can search in a particular part of
the haystack rather than attempting to scrutinise every piece of straw -
is discussed in Elaine Svenonius' The Intellectual Foundation of
Information Organisation (Cambridge, MIT Press 00).
She offers a
demanding but comprehensive introduction to the theory underlying
attempts to identify, categorise and retrieve the resources in the
'global digital library', ie information accessed via the web.
There's a more accessible overview of
identification/evaluation issues and that library in Christine Borgman's
From Gutenberg to the Global Information Infrastructure: Access To
Information in the Networked World (Cambridge, MIT Press 00). It is strongly recommended.
Richard Belew's Finding Out About: Search Engine
Technology From A Cognitive Perspective (Cambridge,
Cambridge Uni Press 01) is a more theoretical study of
search processes.
more information
There's more detailed information on the -
Dublin Core metadata suite (DC)
Resource Description Framework (RDF)
Platform for Internet Content Selection (PICS), the problematical proposal for identifying offensive
material and other online content
Permanent Uniform Resource Locator (PURLS)
Scheme
Uniform Resource Numbering
(URN), a proposal
to identify documents on the web independent of their location, thus
complementing the URLs.
next page (on the
web)
|