overview
metadata
DC
RDF
PICS
PURLs
numbers
UDDI
thesauri
directories
engines
behaviour
chronology |
Directories
This
page looks at web directories - hyperlinked listings that
point to websites and individual pages or other resources.
It covers -
introduction
In essence, a web directory is a personal, institutional
or corporate list of websites or online documents. That
list is online; entries on the list are typically hyperlinked
so that users can readily access the sites/documents with
keying the URL.
That list may be personal and small-scale or may aspire
to cover much of the web. It may be publicly accessible
or may instead be restricted to particular users.
Some directories have a flat structure, with all entries
being given equal weight. Other directories feature categorisations
of varying complexity, with the largest commercial directories
for example comprising multi-level hierarchies organised
by subject and nation/region.
Directories predate the search
engine and co-exist with it. For many early adopters
of the web they were the embodiment of e-commerce, with
large commercial directories being promoted as 'portals'
or virtual shopping malls and expecting to garner substantial
revenue from paid advertising and a share of merchant
turnover. Increasing sophistication of the online population
(and the growth of the web) means that over the past decade
the large-scale commercial directories have become less
significant as a major mechanism for finding information
online. Some demographics, indeed, have abandoned those
directories in favour of search engines such as Google
and specialist directories – in particular those
compiled by subject experts.
The revenue of major directories such as Yahoo! (and access
to capital during the dotcom boom) has, however, resulted
in a blurring of demarcations between the major directories,
search engines and messaging facilities. In particular
the major commercial directories acquired engines –
offering users multiple routes to information –
and sought to underpin their share of the desktop by offering
email or other services, consistent with Andrew Odlyzko's
aphorism that for many people connectivity rather than
content is king.
a brief history of internet directories
In considering the history of directories we can identify
several themes -
- 'portalisation',
as commercial whole-of-web directories expanded from
basic categorised listings to offer search, webmail,
retail and other functionalities
- market
acceptance of industry/subject-specific directories
- a
move away from standalone non-commercial directories
to listings within richer academic, professional and
enthusiast sites
- the
persistence of directories that are online but not-public,
with access on a subscription/sessional fee basis
After
a decade of the web one striking conclusion is that many
of the communitarian and commercial forecasts have simply
been wrong.
numbers and demographics
How many directories are available on the web? The answer
is that no one knows. That is for three reasons.
The first is disagreement about what constitutes a directory.
Is it restricted to major commercial portals such as Yahoo!
and multi-sector non-commercial resources such as DMOZ?
Does it encompass for-profit directories, often of significant
value, that are not publicly accessible? Does it also
include lists that are not much more than a publicly accessible
set of personal bookmarks?
A second reason is the volatility of the web, with pages
(and directories) appearing and disappearing.
A third reason is academic and industry fashion: there
are few commercial incentives for comprehensive mapping
of directories across the web and they are less exciting
than blogs, soft networks, P2P or other recent developments.
Claims that there are 12,500 (or 125,000) web directories
should thus be regarded with caution, particularly since
the few lists substantiating such claims are decidedly
uneven and unsystematic.
What of user reliance on directories? How many people
are using directories? Are user demographics changing?
There is similar uncertainty about the size and attributes
of the online directory population. Industry studies have
focussed on a handful of major 'whole of web' sites and
sectoral sites.
Extrapolation from those sites or from figures about smaller
sites is contentious; much information is anecdotal. Confusion
is exacerbated by many published statistics, which for
example conflate traffic to the directory proper with
traffic to an ancillary feature such as a webmail
gateway.
Overall it appears likely that the major commercial sites
gain substantially more traffic than their more numerous
smaller commercial competitors, some of which appear to
have appropriated parts of their content. That pattern
reflects the greater visibility of the major portals -
attributable to their age (longer time in the public gaze;
perceptions that recent market entrants are copycats),
larger funds for marketing, better opportunities for alliance
building, size of their lists and greater resources for
list maintenance.
Are users happy with directories?
Happiness has been taken for granted, given the proliferation
of large-scale commercial directories and their market
valuation during the dotcom
boom and aftermath. There have, however, been few
convincing and independent studies about effectiveness
and behaviour. Much
'research' about commercial directories has recycled media
releases and some claims appear to be inconsistent with
independent studies. Commercial directories understandably
treat specifics of search -
- what
people are looking for
- whether
they are finding that information
- how
they are navigating the directory
- how
quickly they are finding information or giving up
as
commercial secrets.
derivation and management
In contrast to search engines, which are often fully automated,
the creation and maintenance of large commercial directories
and smaller specialist directories has a substantial human
element.
With large directories such as Yahoo! information about
individual sites/pages - the basis of entries in the listing
- is typically harvested by a web spider (software that
moves from one resource to another by following hyperlinks
and/or domain names) or submitted by site owners/agents.
Some directories charge a fee for early processing of
information submitted to them; the wait for inclusion
in a major directory may be up to six months. Some commercial
services specialise in submitting information on behalf
of site owners to multiple directories and search engines,
often claiming that their submission process will secure
listing ahead of time or gain a favourable ranking. Such
claims are problematical and have resulted in trade practices
litigation in some jurisdictions.
The information is then assigned to one or more categories,
supposedly of most relevance, with different categorisations
including subject hierarchies, geographical location,
alphabetical order and even age. Some directories rely
on automated assignment (eg based on keywords found by
the spider or in a submission form), with or without close
human oversight. The categorised information is then placed
in a HTML page or a database, held on one or more servers,
for access by users of the internet or an intranet.
Smaller directories, particularly those without a commercial
basis, are often compiled wholly by hand, with information
being identified and evaluated in a way that reflects
the directory owner's expertise and contacts.
Maintenance of directories involves periodic automated
or manual checking of links. That checking, in principle,
encompasses whether sites/pages are still online and whether
the categorisation is still pertinent. One problem, for
example, is non-renewal of domain
registrations by site owners, with the domain being renewed by adult content or
other site operators seeking a free ride.
issues
Directories pose a range of issues for users and site
owners -
-
Comprehensiveness
- Authority
- Bias
- Latency
- Usability
- Fraud
- Relevance
As
we have discussed in the Internet Metrics & Statistics
guide elsewhere on this site, the web is large, volatile
(pages/sites appear and disappear) and continues to grow.
No search engine or directory covers all of the web; most
estimates suggest that the largest engines cover only
a small part of the web. None are truly comprehensive.
Specialist directories may, however, cover all major resources
relating to a particular subject ... or all the resources
that an author considers to be significant.
Questions of authority relate to the expertise (or merely
dedication) of both the directory operator and user. In
essence, can you - and should you - trust what you see
online. As with bibliographies, some specialist directories
are of outstanding value because they have been compiled
by subject experts who are equipped to make accurate assessments
and whose sources of information are both deep and broad.
Some of the larger commercial directories - and poorly-maintained
smaller competitors - emphasise volume rather than quality.
Categorisation may use ineffective algorithms, rely on
information submitted by site owners (which may be inaccurate)
or involve people with an inadequate grasp of the directory/site's
language.
Questions of bias arise because directories are compiled
manually or using algorithms that embody particular values.
Bias can be evident in inclusion of an entry in an inappropriate
part of a hierarchy or in a placement that is weighted
towards payment rather than notions of 'merit'.
The human element in directory management is expensive
and many directories - particularly those that have 'screen
scraped' information from another directory - are not
closely maintained. The latency of information in major
portals and in smaller competitors varies. Some directories
(or parts of directories) are frequently updated. Others
are littered with dead links because sites have gone offline
or URLs have changed. As noted above, the link may point
to a 'live' site/page whose content has changed, sometimes
for the worse.
Questions of usability encompass basic accessibility
questions: some directories fail to meet basic guidelines
for access by people with visual or motor problems (or
who merely have a low bandwidth or an expensive connection).
They also encompass navigation through directories that
seek to maximise revenue by crowding advertisements, listings
and other features such as a webmail gateway onto an entry
page and subsidiary pages. The past decade has accordingly
seen an oscillation between very cluttered and 'noisy'
pages - to the extent that some users found them unusable
and moved to search engines or competing directories -
and more austere layouts.
An associated issue is user understanding of the hierarchies
used by the directory owner. Few people have a background
in taxonomy; many find directory hierarchies to be non-intuitive.
Confusion is exacerbated by 'best guess' classification
by directory editors, resulting in uneven or contradictory
arrangement of items in listings. Some users continue
to mistake paid placement for entries whose ranking is
unaffected by payment.
Fraud is an issue because paid placement schemes are susceptible
to poor performance by directory operators and to 'click
fraud' by competitors (typically clicking on a paid link
until the site owner's payment is exhausted and the link
moves to a lower position in the ranking). It is also
an issue because of a proliferation of businesses (or
published guides) that claim to be able to get top/high
rankings for sites listed on directories and in search
engines. Consumer organisations
have noted that inclusion in some commercial directories
is simply a question of money or being found by a spider.
It is not equivalent to a trustmark
and does not necessarily signify that a site is legitimate
or that the site owner's undertakings should be trusted.
A final issue is relevance, the nub of much internet searching
and questions about search
behaviour. Much categorisation in some major directories
often seems hit and miss. A range of studies have demonstrated
that few users are expert in online searching or committed
to extensive searching (and thus generally do not venture
more than a few clicks into a hierarchy). That is a reason
why paid placement - whether through an online advertisement
or through high ranking in a list - has been attractive
to directory operators and site owners. A better match
between user needs and available information may be available
if the user can be persistent and grapple with navigation
and other issues noted above.
Questions about relevance, sharp practice (or outright
fraud), latency and navigation are not restricted to online
directories. They are found in dealing with printed directories
and with CD-ROM directories, which inevitably start to
go out of date as soon as they printed and which may not
meet expectations regarding quality.
a community catalogue?
Large-scale directories are not exclusively commercial.
Under the banner of "The Republic of the Web"
the Open Directory Project (ODP) proclaimed
that
Instead
of fighting the explosive growth of the Internet, the
Open Directory provides the means for the Internet to
organize itself. As the Internet grows, so do the number
of net-citizens. These citizens can each organize a
small portion of the web and present it back to the
rest of the population, culling out the bad and useless
and keeping only the best content. ...
The Open Directory was founded in the spirit of the
Open Source movement, and is the only major directory
that is 100% free. There is not, nor will there ever
be, a cost to submit a site to the directory, and/or
to use the directory's data. The Open Directory data
is made available for free to anyone who agrees to comply
with our free use license.
The Open Directory is the most widely distributed data
base of Web content classified by humans. Its
editorial standards body of net-citizens provide the
collective brain behind resource discovery on the Web.
The Open Directory powers the core directory services
for the Web's largest and most popular search engines
and portals, including Netscape Search, AOL Search,
Google, Lycos, HotBot, DirectHit, and hundreds of others.
ODP
- often badged as DMOZ (Directory.Mozilla) - is a global
'open' directory compiled and maintained by volunteer
editors. It originated as Gnuhoo in 1998, based loosely
on Usenet categorisation, and was rebadged as Newhoo after
it was savaged as riding on the coat-tails of the GNU
free software project, with claims
that it was a commercial product based on volunteer labour.
Further rebadging as ODP occurred after it was acquired
for US$1 million by Netscape,
now part of Time Warner. ODP content was released under
an open content license.
In discussing Wikipedia,
John Tobler commented
that
Unique
at the time, the ODP permitted volunteers to sign up
as editors with individual, or sometimes joint, responsibility
over categories of knowledge within the Open Directory
itself.
... What we built is now used by others, most notably
Google. Google's hierarchical directory uses the ODP
as its starting point but modifies it to suit its percieved
sense of the needs of Google users. The lasting victory
of the original ODP concept is a tribute to the idea
of working openly and together for the benefit of human
knowledge. We did it with *human* editors, not just
algorithms and machines.
A meritocracy emerged over time. Great effort was made
to keep contributions to the ODP within certain boundaries.
The Netscape employees and others who were most responsible,
tried very hard to educate volunteer editors about such
arcanities as ontology and categorization theory. A
sort of peer-enforcement evolved that allowed the volunteer
community to self-police the system, eliminating the
bogus "contributions" of self-serving, and
often profit-making, induhviduals who sought to corrupt
the directory for their own sometimes malicious purposes.
Sigh. Inevitably, the meritocracy became competitive
in nature and certain people who managed to insert themselves
fairly close to the root of the tree got into playing
power games. Some lorded it over other editors who,
perhaps because they held full time jobs, were not able
to devote their entire lives to the project. People,
and I must reveal that this included me, got dissed
for not making enough edits within a certain arbitrary
time period. And then some of this now middle-layer
power group got the power to remove editors who did
not meet their quantity standards.
The
Newhoo/ODP model inspired a number of competitors, including
Go, Zeal and MusicMoz.
the web directory business
The economics of the commercial directory business encompass
revenue, development/maintenance, marketing and facilitation.
The directory sector comprises a large number of enterprises
- unsurprising given perceptions of low entry costs and
potential revenue - but most traffic and most revenue
appears to involve a handful of major operators. It is
thus common to see metrics studies suggesting that the
top four or five directories in a nation attract around
90% of all traffic to commercial directories. (Traffic
to non-commercial directories is inadequately tracked
by the large metrics companies, primarily because they
see little market interest in that data.)
As with search engines, the sector includes directory
operators and businesses that specialise in submission
of information to directories or advising site owners
on maximising their chances for favourable ranking in
the major directories.
Revenue comes from a number of sources, which include
-
- subscriptions
(or search-specific fees) to online directories that
are not publicly available
- payment
by a site to appear on a publicly available directory
or for expedited processing of a submission
- 'paid
placement', whether through provision of a link adjacent
to the top of a list or by buying an appearance within
the list
- sale
of advertising, in particular for inclusion of banner
ads on a directory's front page and/or on the main pages
for major categories and fees for click-through from
banner or other ads (from a fraction of a cent to dollars
per click)
- sharing
in revenue from sales through online stores
- sale
of 'deidentified' demographic data about traffic to
the directory
- revenue
that is attributable to advertising or other aspects
of ancillary services such as web mail and search engines.
next page
(search engines)
|
|