overview
on the web
Dublin Core
RDF
PICS
PURLs
URNs
|
on the web
This page looks at use of metadata on the web.
the standards question
Internet engineering
and standards bodies have not mandated detailed standards
for metadata. That means, for example, that there's no standardized
terminology and thesaurus (one reason why many librarians
look at the web askance).
Essentially, in developing
the web provision was made for the inclusion of metadata within
pages/sites, allowing descriptive and other information to be embedded
in each page among the 'invisible' code.
Provision was also made for
construction of search engines and other tools to point to web pages,
drawing on the embedded metadata or using their own metadata about those
pages.
That's had several results:
There's disagreement among specialist users
about development of specific standards for the structuring and
expression of embedded metadata. (Competing and complementary standards
from librarians, museum curators, informatics specialists and others
include the Dublin Core, AAT, CSDGM,
GIS, CGIS-SAIF, Resource Description Framework and
Warwick Framework.
There's similar disagreement about content rating
metadata such as PICS used in censorship
or content management schemes). As Charles Thomas & Linda
Griffin note in their First Monday article
on Who Will Create The Metadata For The Internet?, while there
are commercial incentives for effective metadata, the various schemes
have to break out of the silicon ghetto
The wide range of search engines and
directories produce different results. There are now at least 2,000 search engines although most
traffic goes to the top 11 such as Yahoo! and Google.
Most pages (and probably most sites)
don't have descriptive metadata. Some studies
suggest that only 34% have 'meaningful' metadata and that much metadata
is not relevant to the particular site. Less than 0.3% of sites (and
thus a much smaller fraction of the 'deep web' described in our metrics
guide) uses Dublin Core metadata.
Few major search engines rely on
metadata supplied by the owners of sites. One industry figure quoted
in Search Engine Watch comments
"search engines do not trust metadata. It's fine to talk about
how nice it would be if all web pages were categorized, but the
search engines know from experience that people will lie, mislead or
do whatever they can to get on top".
where does it come from?
In practice metadata about a page
originates in two ways.
The creator of the page can embed
metadata when constructing (or amending the page).
Some software used in
building sites will automatically generate such metadata, albeit
crudely. We have manually developed the metadata for each page on this
site, for example. Many creators are uncertain about the nature of
metadata - what is it, where does it go, what terms to use - or see it
as an afterthought rather than integral to electronic publishing.
A second way is the creation of
metadata about the page by an unrelated entity, ie by something/someone
that visits the page rather than by the page's
owner.
Many search engines use 'robots' or 'spiders'
to visit pages, look for significant terms within the text and
incorporate that information within the database that fuels the search
engine or flags that it has objectionable content. Other engines and
directories use humans to examine the pages and create the metadata.
does it matter?
As you might expect,
there's disagreement about what matters.
It's clear that most search engines
ignore metadata embedded by creators. A 1997 report
for example commented that "search
engines do not trust metadata. It's fine to talk about how nice it would
be if all web pages were categorized, but the search engines know from
experience that people will lie, mislead or do whatever they can to get
on top".
More broadly, many sites will never
rank highly on search engines. Their
owners should concentrate on driving traffic to them in other
ways.
On the other hand, in parts of the web
- such as libraries, image archives and bodies dealing with geospatial
information - there is agreement about use of metadata and about
specific standards, for example Dublin Core.
Consistent use of metadata
schemes, often as a consequence of the management of information within
each body's databases, facilitates information exchange outside the web
and for example the operation of 'gateways' or sectoral search engines
that provide seamless access to the holdings of a group of museums.
next page (Dublin
Core)
|