Publishing Guide: Digitising and Archiving

home | about | site use | services | guides | profiles | papers || Analysphere | Ketupa | Cinetext

overview

past & future

economics

studies

delivery

formats

monographs

e-journals

newspapers

directories

video

interactive

editing

business

education

government

culture

e-books

libraries

digitisation

on demand

rights trade

DIY

systems

related
Guides:

Intellectual
Property

Censorship

Design

Accessibility

Information
Economy

related
Profiles:

Print &
the Book

Blogging

Digitisation and archiving

Large-scale projects to 'digitise the past' and thereby ensure future generations have networked access to print publications, photographs, sound recordings, cinefilms and other material have proved contentious.

Digitisation means users view a 'digital surrogate' (preserving often fragile originals), access is not tied to physical proximity (ie ease of convenience and savings in staff costs) and physical storage requirements are reduced, although costs savings are not as great as anticipated and there's been considerable criticism of institutions - such as the British Library - that digitised and then destroyed major parts of their collection.

The Preserving Digital Information report of the CPA & RLG suggests that digitisation by individual institutions is often not cost effective; however resource sharing (ie collaborative digitisation and access to shared material through an intranet or a global digital library) is attractive.

Andrew Odlyzko echoed Michael Leask, author of Practical Digital Libraries: Books, Bytes & Bucks (San Francisco, Morgan Kaufmann 97), in noting that

the costs of just the buildings of the new British Library in London and the new French National Library in Paris are two or three times higher than the costs of converting their book collections to a digital format. In a more rational world, the money going into bricks and mortar would have gone into scanning the books, which would have provided much more rapid and convenient access to the data for scholars. The physical volumes themselves could be housed in cheap warehouses, for the rare occasions when they might have to be consulted. However, user resistance to new media, copyright constraints, and the politicians' and the public's liking for visible edifices and for solid books make it hard to take that step.

.... the entire mathematical literature collected over the centuries is perhaps 30 million pages, so digitizing it at a cost of $0.60 per page would cost $18 million, less than ten percent of the annual journal bill

     benchmarks

In the US the American Memory (AM) project, aimed at providing digital access to millions of items held by the Library of Congress and other institutions has, for political as well as technological reasons, concentrated on the digitisation of images - including maps, paintings, photographs - and some manuscripts of literary or historic significance.

Locally the National Archives of Australia (NAA) has digitised key federation documents and commenced the daunting task of providing digital colour facsimiles of the millions of documents in its custody, while the National Library's PictureAustralia (PA) is a gateway for images from the State Library of Victoria, University of Queensland Library, Australian War Memorial and other institutions.

The University of California's Alexandria Digital Library project (Pharos) aims to create a digital library encompassing maps and pictorial material for use by institutions across the US.

Yale University's Project Open Book (POB) is exploring the conversion of microfilm, hitherto the medium of choice among the archival mafia, to digital imagery.

The Mellon Foundation, noted earlier in this guide, has funded the large-scale Journal Storage (JSTOR) Project, with universities coming together to provide ongoing electronic access in a secure environment to over 147 law, science and humanities journals. Imaging of that print material is now close to the target of 750,000 journal pages, with access by over 1,000 institutions. In April the Foundation announced establishment of artSTOR, a large-scale digital image library.

As part of the Making of America Project a consortium of US universities such as Cornell and the Uni of Michigan are placing the text of several thousand magazines and books online.

     private projects

Most media attention has focussed on two private initiatives - Bartleby and Gutenberg - although they're dwarfed by major academic digitisation projects.

Project Bartleby (Bartleby) is began with online publication of Whitman's Leaves of Grass and now features a full-text searchable database containing over 200,000 web pages, including over 22,000 quotations and 4,765 poems. Most of the content is out of copyright: Bartleby's essentially capturing old publications.

Project Gutenberg (Gutenberg) also draws on public domain works. Presentation is in ASCII rather than HTML or PDF and material is added to the database by volunteers so the coverage is eclectic rather than comprehensive. Gutenberg has around 3,000 titles. It's unrelated to the academic Gutenberg-E project described in Analysphere of 1 July. There's a characteristically incisive analysis by Bradford DeLong here, commenting that founder Michael Hart's dream "has failed to achieve any form of critical mass" in contrast to Linux and continues to move ahead at a snail's pace.

The more ambitious Universal Library Project (UL) aims to "start a worldwide movement to make available ALL the Authored Works of Mankind on the Internet so that anyone can access these works from any place at any time". Searching and viewing would be free; individuals and existing libraries would be able to purchase digital copies.

     archiving the web

There is increasing interest in archiving the web, with projects providing thematic/sectoral collections, offering snapshots or more grandiosely attempting to capture the entire web.

An example of the latter is the US-based Internet Archive, under the leadership of Brewster Kahle. His 2001 Public Access to Digital Material article (with Rick Prelinger & Mary Jackson) claimed that universal digital access is attainable and is the "epic opportunity of our digital age", since

the technology has reached the point where scanning all books, digitizing all audio recordings, downloading all websites, and recording the output of all TV and radio stations is not only feasible but less costly than buying and storing the physical versions.

That's an intriguing but very problematical vision, with major questions regarding intellectual property and resource identification. We've explored some of the issues in a more detailed profile.

next page (on demand)

version of November 2002