Search profile: Directories

overview

directories

engines

images

shopping

behaviour

law

cases

landmarks

related
Guides:

Networks

Intellectual
Property

Privacy

related
Profiles:

Metadata

Optimisation

Search
Terms

Yellow
Pages

Browsers

Directories

This page looks at web directories - hyperlinked listings that point to websites and individual pages or other resources.

It covers -

introduction - what is a directory?
a brief history of web directories - their rise and transformation
numbers and demographics - how many directories and users, what demographics?
derivation and management - how are directories compiled and maintained
issues - questions about the functionality, coverage and bias of online directories
a community catalogue of the web? - DMOZ and other open directories
the web directory business - industry economics
studies - academic and industry resources regarding directories and the directory business

introduction

In essence, a web directory is a personal, institutional or corporate list of websites or online documents. That list is online; entries on the list are typically hyperlinked so that users can readily access the sites/documents with keying the URL.

That list may be personal and small-scale or may aspire to cover much of the web. It may be publicly accessible or may instead be restricted to particular users.

Some directories have a flat structure, with all entries being given equal weight. Other directories feature categorisations of varying complexity, with the largest commercial directories for example comprising multi-level hierarchies organised by subject and nation/region.

Directories predate the search engine (eg specialist buyers guides, research guides and yellow pages) and co-exist with it. For many early adopters of the web they were the embodiment of e-commerce, with large commercial directories being promoted as 'portals' or virtual shopping malls and expecting to garner substantial revenue from paid advertising and a share of merchant turnover.

Increasing sophistication of the online population (and the growth of the web) means that over the past decade the large-scale commercial directories have become less significant as a major mechanism for finding information online. Some demographics, indeed, have abandoned those directories in favour of search engines such as Google and specialist directories – in particular those compiled by subject experts.

The revenue of major directories such as Yahoo! (and access to capital during the dotcom boom) has, however, resulted in a blurring of demarcations between the major directories, search engines and messaging facilities. In particular the major commercial directories acquired engines – offering users multiple routes to information – and sought to underpin their share of the desktop by offering email or other services, consistent with Andrew Odlyzko's aphorism that for many people connectivity rather than content is king.

a brief history of internet directories

In considering the history of directories we can identify several themes -

'portalisation', as commercial whole-of-web directories expanded from basic categorised listings to offer search, webmail, retail and other functionalities
market acceptance of industry/subject-specific directories
a move away from standalone non-commercial directories to listings within richer academic, professional and enthusiast sites
the persistence of directories that are online but not-public, with access on a subscription/sessional fee basis

The first pages of the web had the characteristics of directories, pointing to other resources. Given the precedent provided by 'contents' and 'mall' pages on private networks such as AOL - the maps for navigating the walled gardens - it is unsurprising that commercial operators developed directories as the web grew. Those operators faced challenges in maximising initial and recurrent traffic to their directories.

One response was to increase the breadth and depth of the individual directory, typically being marketed as covering all of the web ... or all of the web worth visiting. Growth of particular directories without commensurate improvements in usability had three consequences. Savvy users moved towards search engines, a move recognised in predictions that normalisation of the online population would ultimately see directory visitations decline unless the major directories incorporated a search engine based on automated spidering of the web.

Another response was churn by users to competing directories, to smaller directories with a more specific focus or to 'localised' versions of the parent directory (eg that emphasise information for a specific nation, state or even city).

A third response was to increase the 'stickiness' of individual directories by making them true portals for activity online. That 'portalisation' involved expansion from basic categorised listings through inclusion of news, webmail, personal ads, 'infotainment' such as horoscopes and other functionalities

User concerns about navigation, accuracy and authority were reflected in market acceptance of industry and subject-specific directories.

Recent years have seen a move away from stand-alone non-commercial directories to listings within richer academic, professional and enthusiast sites.

Contrary to claims that the internet necessarily means the death of 'paid publishing' (in reality the demise of the publishing model based on direct payments for access by end users, rather than advertisers) it is clear that 'closed' online directories have persisted and even flourished. Those directories are online but not-public, with access on a subscription/sessional fee basis.

After a decade of the web one striking conclusion is that many of the communitarian and commercial forecasts have simply been wrong.

numbers and demographics

How many directories are available on the web?

The answer is that no one knows. That is for three reasons.

The first is disagreement about what constitutes a directory. Is it restricted to major commercial portals such as Yahoo! and multi-sector non-commercial resources such as DMOZ? Does it encompass for-profit directories, often of significant value, that are not publicly accessible? Does it also include lists that are not much more than a publicly accessible set of personal bookmarks?

A second reason is the volatility of the web, with pages (and directories) appearing and disappearing.

A third reason is academic and industry fashion: there are few commercial incentives for comprehensive mapping of directories across the web and they are less exciting than blogs, soft networks, P2P or other recent developments.

Claims that there are 12,500 (or 125,000) web directories should thus be regarded with caution, particularly since the few lists substantiating such claims are decidedly uneven and unsystematic.

What of user reliance on directories? How many people are using directories? Are user demographics changing?

There is similar uncertainty about the size and attributes of the online directory population. Industry studies have focussed on a handful of major 'whole of web' sites and sectoral sites.

Extrapolation from those sites or from figures about smaller sites is contentious; much information is anecdotal. Confusion is exacerbated by many published statistics, which for example conflate traffic to the directory proper with traffic to an ancillary feature such as a webmail gateway.

Overall it appears likely that the major commercial sites gain substantially more traffic than their more numerous smaller commercial competitors, some of which appear to have appropriated parts of their content. That pattern reflects the greater visibility of the major portals - attributable to their age (longer time in the public gaze; perceptions that recent market entrants are copycats), larger funds for marketing, better opportunities for alliance building, size of their lists and greater resources for list maintenance.

In 2006 MySpace inched past Yahoo!, recording 38.7 billion page views in the US.

Are users happy with directories?

Happiness has been taken for granted, given the proliferation of large-scale commercial directories and their market valuation during the dot-com boom and aftermath. There have, however, been few convincing and independent studies about effectiveness and behaviour. Much 'research' about commercial directories has recycled media releases and some claims appear to be inconsistent with independent studies.

Commercial directories understandably treat specifics of search -

what people are looking for
whether they are finding that information
how they are navigating the directory
how quickly they are finding information or giving up

as commercial secrets.

derivation and management

In contrast to search engines, which are often fully automated, the creation and maintenance of large commercial directories and smaller specialist directories has a substantial human element.

With large directories such as Yahoo! information about individual sites/pages - the basis of entries in the listing - is typically harvested by a web spider (software that moves from one resource to another by following hyperlinks and/or domain names) or submitted by site owners/agents.

Some directories charge a fee for early processing of information submitted to them; the wait for inclusion in a major directory may be up to six months. Some commercial services specialise in submitting information on behalf of site owners to multiple directories and search engines, often claiming that their submission process will secure listing ahead of time or gain a favourable ranking. Such claims are problematical and have resulted in trade practices litigation in some jurisdictions.

The information is then assigned to one or more categories, supposedly of most relevance, with different categorisations including subject hierarchies, geographical location, alphabetical order and even age. Some directories rely on automated assignment (eg based on keywords found by the spider or in a submission form), with or without close human oversight. The categorised information is then placed in a HTML page or a database, held on one or more servers, for access by users of the internet or an intranet.

Smaller directories, particularly those without a commercial basis, are often compiled wholly by hand, with information being identified and evaluated in a way that reflects the directory owner's expertise and contacts.

Maintenance of directories involves periodic automated or manual checking of links. That checking, in principle, encompasses whether sites/pages are still online and whether the categorisation is still pertinent. One problem, for example, is non-renewal of domain registrations by site owners, with the domain being renewed by adult content or other site operators seeking a free ride.

issues

Directories pose a range of issues for users and site owners -

Comprehensiveness
Authority
Bias
Latency
Usability
Fraud
Relevance

As we have discussed in the Internet Metrics & Statistics guide elsewhere on this site, the web is large, volatile (pages/sites appear and disappear) and continues to grow. No search engine or directory covers all of the web; most estimates suggest that the largest engines cover only a small part of the web. None are truly comprehensive. Specialist directories may, however, cover all major resources relating to a particular subject ... or all the resources that an author considers to be significant.

Questions of authority relate to the expertise (or merely dedication) of both the directory operator and user. In essence, can you - and should you - trust what you see online. As with bibliographies, some specialist directories are of outstanding value because they have been compiled by subject experts who are equipped to make accurate assessments and whose sources of information are both deep and broad. Some of the larger commercial directories - and poorly-maintained smaller competitors - emphasise volume rather than quality. Categorisation may use ineffective algorithms, rely on information submitted by site owners (which may be inaccurate) or involve people with an inadequate grasp of the directory/site's language.

Questions of bias arise because directories are compiled manually or using algorithms that embody particular values. Bias can be evident in inclusion of an entry in an inappropriate part of a hierarchy or in a placement that is weighted towards payment rather than notions of 'merit'.

The human element in directory management is expensive and many directories - particularly those that have 'screen scraped' information from another directory - are not closely maintained. The latency of information in major portals and in smaller competitors varies. Some directories (or parts of directories) are frequently updated. Others are littered with dead links because sites have gone offline or URLs have changed. As noted above, the link may point to a 'live' site/page whose content has changed, sometimes for the worse.

Questions of usability encompass basic accessibility questions: some directories fail to meet basic guidelines for access by people with visual or motor problems (or who merely have a low bandwidth or an expensive connection). They also encompass navigation through directories that seek to maximise revenue by crowding advertisements, listings and other features such as a webmail gateway onto an entry page and subsidiary pages. The past decade has accordingly seen an oscillation between very cluttered and 'noisy' pages - to the extent that some users found them unusable and moved to search engines or competing directories - and more austere layouts.

An associated issue is user understanding of the hierarchies used by the directory owner. Few people have a background in taxonomy; many find directory hierarchies to be non-intuitive. Confusion is exacerbated by 'best guess' classification by directory editors, resulting in uneven or contradictory arrangement of items in listings. Some users continue to mistake paid placement for entries whose ranking is unaffected by payment.

Fraud is an issue because paid placement schemes are susceptible to poor performance by directory operators and to 'click fraud' by competitors (typically clicking on a paid link until the site owner's payment is exhausted and the link moves to a lower position in the ranking).

It is also an issue because of a proliferation of businesses (or published guides) that claim to be able to get top/high rankings for sites listed on directories and in search engines. Consumer organisations have noted that inclusion in some commercial directories is simply a question of money or being found by a spider. It is not equivalent to a trustmark and does not necessarily signify that a site is legitimate or that the site owner's undertakings should be trusted.

A final issue is relevance, the nub of much internet searching and questions about search behaviour. Much categorisation in some major directories often seems hit and miss. A range of studies have demonstrated that few users are expert in online searching or committed to extensive searching (and thus generally do not venture more than a few clicks into a hierarchy).

That is a reason why paid placement - whether through an online advertisement or through high ranking in a list - has been attractive to directory operators and site owners. A better match between user needs and available information may be available if the user can be persistent and grapple with navigation and other issues noted above.

Questions about relevance, sharp practice (or outright fraud), latency and navigation are not restricted to online directories. They are found in dealing with printed directories and with CD-ROM directories, which inevitably start to go out of date as soon as they printed and which may not meet expectations regarding quality.

a community catalogue?

Large-scale directories are not exclusively commercial. Under the banner of "The Republic of the Web" the Open Directory Project (ODP) proclaimed that

Instead of fighting the explosive growth of the Internet, the Open Directory provides the means for the Internet to organize itself. As the Internet grows, so do the number of net-citizens. These citizens can each organize a small portion of the web and present it back to the rest of the population, culling out the bad and useless and keeping only the best content. ...

The Open Directory was founded in the spirit of the Open Source movement, and is the only major directory that is 100% free. There is not, nor will there ever be, a cost to submit a site to the directory, and/or to use the directory's data. The Open Directory data is made available for free to anyone who agrees to comply with our free use license.

The Open Directory is the most widely distributed data base of Web content classified by humans. Its editorial standards body of net-citizens provide the collective brain behind resource discovery on the Web. The Open Directory powers the core directory services for the Web's largest and most popular search engines and portals, including Netscape Search, AOL Search, Google, Lycos, HotBot, DirectHit, and hundreds of others.

ODP - often badged as DMOZ (Directory.Mozilla) - is a global 'open' directory compiled and maintained by volunteer editors. It originated as Gnuhoo in 1998, based loosely on Usenet categorisation, and was rebadged as Newhoo after it was savaged as riding on the coat-tails of the GNU free software project, with claims that it was a commercial product based on volunteer labour. Further rebadging as ODP occurred after it was acquired for US$1 million by Netscape, now part of Time Warner. ODP content was released under an open content license.

In discussing Wikipedia, John Tobler commented that

Unique at the time, the ODP permitted volunteers to sign up as editors with individual, or sometimes joint, responsibility over categories of knowledge within the Open Directory itself.

... What we built is now used by others, most notably Google. Google's hierarchical directory uses the ODP as its starting point but modifies it to suit its percieved sense of the needs of Google users. The lasting victory of the original ODP concept is a tribute to the idea of working openly and together for the benefit of human knowledge. We did it with *human* editors, not just algorithms and machines.

A meritocracy emerged over time. Great effort was made to keep contributions to the ODP within certain boundaries. The Netscape employees and others who were most responsible, tried very hard to educate volunteer editors about such arcanities as ontology and categorization theory. A sort of peer-enforcement evolved that allowed the volunteer community to self-police the system, eliminating the bogus "contributions" of self-serving, and often profit-making, induhviduals who sought to corrupt the directory for their own sometimes malicious purposes.

Sigh. Inevitably, the meritocracy became competitive in nature and certain people who managed to insert themselves fairly close to the root of the tree got into playing power games. Some lorded it over other editors who, perhaps because they held full time jobs, were not able to devote their entire lives to the project. People, and I must reveal that this included me, got dissed for not making enough edits within a certain arbitrary time period. And then some of this now middle-layer power group got the power to remove editors who did not meet their quantity standards.

The Newhoo/ODP model inspired a number of competitors, including Go, Zeal and MusicMoz.

the web directory business

The economics of the commercial directory business encompass revenue, development/maintenance, marketing and facilitation.

The directory sector comprises a large number of enterprises - unsurprising given perceptions of low entry costs and potential revenue - but most traffic and most revenue appears to involve a handful of major operators. It is thus common to see metrics studies suggesting that the top four or five directories in a nation attract around 90% of all traffic to commercial directories. (Traffic to non-commercial directories is inadequately tracked by the large metrics companies, primarily because they see little market interest in that data.)

As with search engines, the sector includes directory operators and businesses that specialise in submission of information to directories or advising site owners on maximising their chances for favourable ranking in the major directories.

Revenue comes from a number of sources, which include -

subscriptions (or search-specific fees) to online directories that are not publicly available
payment by a site to appear on a publicly available directory or for expedited processing of a submission
'paid placement', whether through provision of a link adjacent to the top of a list or by buying an appearance within the list
sale of advertising, in particular for inclusion of banner ads on a directory's front page and/or on the main pages for major categories and fees for click-through from banner or other ads (from a fraction of a cent to dollars per click)
sharing in revenue from sales through online stores
sale of 'deidentified' demographic data about traffic to the directory
revenue that is attributable to advertising or other aspects of ancillary services such as web mail and search engines.

next page (search engines)