overview
on the web
Dublin Core
RDF
PICS
PURLs
URNs
UDDI
thesauri
directories
engines
|
overview
This
profile looks at metadata, the information that identifies
web pages and thus forms the basis of some search engines,
directories and content management systems.
The profile also considers directories, portals and search
engines.
what's in this profile
The following pages provide more detailed information
about -
on
the web - the challenges of finding information
online
the Dublin Core metadata suite (DC)
Resource Description Framework (RDF)
Platform for Internet Content Selection (PICS),
the problematical proposal for identifying offensive
material and other online content
Permanent Uniform Resource Locator (PURL)
Scheme
Uniform Resource Numbering (URN),
a proposal to identify documents on the web independent
of their location, thus complementing the URLs.
Proposals for thesauri
that would assist resource identification over the web
as a whole or within particular sectors, such as the
visual arts or biotechnology
Commercial and volunteeer Directories
and portals covering the web as a whole or particular
specialities - industries, geographical locations, areas
of interest/expertise
Search engines
what is metadata?
Metadata is literally information about information.
It may be very restricted in scope, such as a simple identification
number. Or it may be descriptive, allowing the creation
of indexes, lists and other tools that can be used for
identification and for evaluation of information.
If you've used a library catalogue you've used such a
tool. The catalogue is based on metadata - subject, author,
publisher etc - about the books and other documents held
by that institution.
Metadata is one of the key features of the web. It is
found within individual web pages, at varying levels of
detail and using varying standards, highlighted below.
And it's found in the search engines, directories and
other tools for finding sites and individual pages. The
next page of this guide looks at those engines and directories.
This site, indeed, can be viewed as metadata about information
on the web and offline, since it identifies and evaluates
several thousand sites, web documents and print publications.
In the metrics guide on
this site we highlight some of the studies about the growth
of the web.
There are now many millions of sites and hundreds of millions
of pages. Many of those documents change periodically
(eg one study
suggests that the 'half life' of a page is less than two
years, roughly half the time it takes for most books to
go out of print and one reason why many big sites - such
as this one - have links that have "rotted").
And domain names don't reveal all the treasures (or lack
of them) within a site. The size and volatility of the
web means that it is beyond anyone to list the contents
of all sites/pages and to provide an evaluation.
classification
and its consequences
The importance of identification
and evaluation - so that your customers can search in
a particular part of the haystack rather than attempting
to scrutinise every piece of straw - is discussed in Elaine
Svenonius' The Intellectual Foundation of Information
Organisation (Cambridge: MIT Press 00).
She offers a demanding but comprehensive introduction
to the theory underlying attempts to identify, categorise
and retrieve the resources in the 'global digital library',
ie information accessed via the web.
There's a more accessible overview of identification/evaluation
issues and that library in Christine Borgman's From
Gutenberg to the Global Information Infrastructure: Access
To Information in the Networked World (Cambridge:
MIT Press 00). It is strongly recommended. Richard Belew's
Finding Out About: Search Engine Technology From A
Cognitive Perspective (Cambridge: Cambridge Uni Press
01) is a more theoretical study of search processes.
Among specialist and general journals we recommend the
Journal
of Internet Cataloging (JIC), D-LIB and the
terribly earnest Information Trechnologies & Libraries
(ITAL).
next page (on the
web)
|