caslon analytics elephant logoahrooogah!!title graphic for web metrics and statistics guide

home | about | site use | services | guides | briefings  


overview

sizing

traffic

demographics

methodologies

digital divides

jargon

resources


section heading icon    
sizing the web: domains, sites, pages


This page examines the size and shape of the Web: estimates of the number of domains, pages and users.

Identification of the total number of sites on the Web is problematical.  Figures about domains are more certain than figures on the number of pages, the number of links or the number of viewers. 

The implications of those figures are even more problematical, as Andrew Odlyzko notes in a cogent paper on Internet Growth: Myth & Reality, Use & Abuse in the November 2000 issue of Information Impacts magazine.

subsection heading icon     domains and servers

As of 10 June 2000 one global figure for registered domains is 17.75 million, including 9.48 million dot com domains.  Those figures come from the DomainStats site. 

Netcraft argues that there are over 15 million domains, with slower growth of registrations in 2000 (around 10% per month) and the disappearance of 330,000 domains. 

The January 2000 Domain Survey by the Internet Software Consortium (ISC) suggests that there are upwards of 88 million hosts on the Internet.

schematic of growth in internet hosts  - by 10 times every 3 years  

The OECD believes that there are around 52 thousand secure servers - tools for electronic commerce - in the USA, and upwards of 74 thousand in the OECD as a whole (a growth of 95% over the preceding year). 

subsection heading icon     where is the growth occurring? 

The Mosaic Group at the University of Omaha has a project on Global Diffusion of the Internet, measuring growth of the net on a global and nation by nation basis. 

The Internet Geography Project (IGP) at the University of California, under Matthew Zook, offers authoritative maps and a number of excellent papers.

The UN Development Program 1999 Human Development Report (HDR) includes statistics about internet diffusion in the third and fourth worlds.

subsection heading icon     number of pages and documents

The latest academic estimate from the US is that the Web has some 800 million pages - this page is 800 million plus one - with Northern Lights, the most inclusive search engine, covering less than 16% of that figure. 

Three of the seminal papers - often referred to as the 'NEC studies' - are How Big is the Web (HBW), Accessibility & Distribution of Information on the Web (ADIW) and the 1998 and 1999 Search Engine Coverage Update (SECU) by Steve Lawrence & C Lee Giles. The most recent paper suggests that the Web is growing faster than coverage by the search engines and that dead links are more common.  

Inktomi and NEC Research earlier this year estimated that there were more than a billion "unique pages". 

Wallace Koehler's paper on Digital Libraries & WWW Persistence estimates that the 'half life' of a web page is less than two years and the half life of a site is a bit more than two years. We've highlighted some of the consequences for information retrieval in our Connecting guide.

US metrics company Cyveillance estimates that there are over 2.1 billion pages on the web (heading towards 4 billion by the end of 2001) with the "average page" having 23 internal links, 6 external links and 14 images. 

BrightPlanet, a new entrant to the search engine market, claims that "the deep Web" contains "550 billion individual documents", with only a small fraction indexed by its competitors. 

That figure, like many web statistics, is problematical. More importantly, unlike the 'surface web' the deep web information is generally not publicly accessible, eg involves a subscription or item fee or resides on a corporate intranet. That's one reason for concern about digital divides. It's also a reason why academic/public libraries have an ongoing role.

The major study by Hal Varian & Peter Lyman on scoping the 'information universe' -  quantifying what's produced, transmitted, consumed and archived - is of relevance.


icon for link to next page   next page (traffic)