caslon analytics elephant logoahrooogah!!header for electronic publishing guide

home | about | site use | services | guides | briefings  


overview

studies

delivery

formats

academic

editing

business

government

e-books

libraries

digitisation

on demand

case studies

content

section heading icon
     formats: hypertext & images

As we note in our network guide, the internet is a network of networks: in practice a broad set of standards that allows information to be exchanged by machines across the globe. 

The web is one part of the net, based on a subset of standards that allow users to readily identify and navigate within/from electronic documents and allow publishers to incorporate a mix of text, graphics and audiovisual content such as video or sound recordings.

Those standards are broad, as they have to be to accommodate significant variations in the 'pipelines' that transfer information, the performance characteristics of different machines and the behaviour of different software. Different machines/software display the same information differently (and in some cases don't display it at all well).

The web is based on derivatives of Standard Generalized Markup Language (SGML), a text annotation protocol originally developed for offline publishing. In essence, SGML describes the relationship between a document's content and its structure. It's been enshrined by the International Standards Organization; Tony Hicks's article Should We Be Using ISO 12083? is a concise introduction. 

Many of the documents currently available on the web have been prepared using Hypertext Markup Language (HTML), based on SGML. There's a plain English guide at the World Wide Web Consortium (W3C) site; other information's found in our Design guide.

HTML allows publishers to broadly specify how information is displayed online and how it is structured, for example enabling users to navigate within individual documents or to other sites using hyperlinks. 

Many commercial/professional sites now use Extensible Markup Language (XML), which can be generated from specialist databases and accommodates specialist dialects used for example in data exchange between manufacturers. As with any dialect, there are concerns about compatibility ... something that may impede growth of B2B (or merely encourage the involvement of government competition watchdogs). 

HTML - currently the lingua franca of the web - accommodates a variety of graphic formats: TIFF, GIF, JPEG, PNG and so forth. The W3C and specialist bodies are actively exploring particular enhancements, eg for identifying and describing documents using metadata, and are considering proposals that XHTML ('HTML4') become the new global standard. 

Adoption of XHTML appears likely, given consumer/commercial interest in a more powerful generation of HTML and ongoing enhancement of the networks (eg bigger pipes, more powerful machines), but is likely to take some time. Thom Lieb provided a concise introduction his article on The X(HTML) Files.

section marker icon     PDF

The SGML family cannot provide a facsimile of a printed page - ie the same fonts, layout and colours of a corporate brochure, annual report or technical publication - and can only feature charts if they're converted into an image (eg a GIF).

Much of the time that faithfulness to print is unnecessary: variations in font and proportions are of less importance than immediate access to the information. 

For those instances where a facsimile is required, the Portable Document Format (PDF) - a proprietary standard created by software developer Adobe Systems - allows publishers to present a publication (generally prepared using desktop publishing software) onscreen or as a printout with the same appearance as print.

PDF essentially freezes the original publication by converting it into an image (which depending on the length of the document ranges from one page to thousands of pages). 

Because it is an image, it is significantly slower than HTML to download - as a result we recommend that long documents be broken into bite-sized chunks - and cannot be automatically generated from a database in the same way as XML. 

However, it is attractive to many publishers, particularly if used as an adjunct to an SGML version. It is now possible to include hypertext links within PDFs and for search engines to index constituent text. PDF is not an appropriate tool for all markets and the devices used by many visually disabled people unfortunately will not translate PDF documents into speech.

PDF Reference: Adobe Portable Document Format Version 1.3 (Boston, Addison-Wesley 00) is the definitive guide to PDF. Apart from Adobe's site, we recommend the independent PlanetPDF.

section marker icon     TEI

The Text Encoding Initiative (TEI) is an international project to develop guidelines for the encoding of textual material in electronic form for research purposes. The TEI Guidelines are online.


icon for link to next page    next page  (academic e-publishing)