overview
studies
delivery
formats
academic
editing
business
government
e-books
libraries
digitisation
on demand
case studies
content
|
formats: hypertext & images
As we
note in our
network guide,
the internet is a network of networks: in practice a broad set of
standards that allows information to be exchanged by machines across the
globe.
The web is one part of the net, based on a subset of standards
that allow users to readily identify and navigate within/from electronic
documents and allow publishers to incorporate a mix of text, graphics
and audiovisual content such as video or sound recordings.
Those standards are broad, as they have
to be to accommodate significant variations in the 'pipelines' that
transfer information, the performance characteristics of different
machines and the behaviour of different software. Different
machines/software display the same information differently (and in some
cases don't display it at all well).
The web is based on derivatives of Standard
Generalized Markup Language (SGML), a text annotation protocol
originally developed for offline publishing. In essence, SGML describes the relationship between a document's content and its
structure. It's been enshrined by the International Standards
Organization; Tony Hicks's article
Should
We Be Using ISO 12083? is a concise introduction.
Many of the documents currently
available on the web have been prepared using Hypertext Markup Language
(HTML), based on SGML. There's a plain English guide at
the World Wide Web Consortium (W3C) site; other information's found in
our Design guide.
HTML allows publishers to broadly
specify how information is displayed online and how it is structured,
for example enabling users to navigate within individual documents or to
other sites using hyperlinks.
Many commercial/professional sites now use
Extensible Markup Language (XML), which can be generated from specialist
databases and accommodates specialist dialects used for example in data
exchange between manufacturers. As with any dialect, there are concerns
about compatibility ... something that may impede growth of B2B (or
merely encourage the involvement of government competition
watchdogs).
HTML - currently the lingua franca of
the web - accommodates a variety of graphic formats: TIFF, GIF, JPEG,
PNG and so forth. The W3C and specialist bodies are actively exploring
particular enhancements, eg for identifying and describing documents
using metadata, and are considering
proposals that XHTML ('HTML4')
become the new global standard.
Adoption of XHTML appears likely, given
consumer/commercial interest in a more powerful generation of HTML and
ongoing enhancement of the networks (eg bigger pipes, more powerful
machines), but is likely to take some time.
Thom Lieb provided a concise introduction his article
on The X(HTML) Files.
PDF
The SGML family cannot provide a
facsimile of a printed page - ie the same fonts, layout and colours of a
corporate brochure, annual report or technical publication - and can
only feature charts if they're converted into an image (eg a GIF).
Much of the time that
faithfulness to print is unnecessary: variations in font and proportions
are of less importance than immediate access to the information.
For
those instances where a facsimile is required, the Portable Document Format (PDF)
- a proprietary standard created by software developer Adobe Systems -
allows publishers to present a publication (generally prepared using
desktop publishing software) onscreen or as a printout with the same
appearance as print.
PDF essentially freezes the original
publication by converting it into an image (which depending on the
length of the document ranges from one page to thousands of pages).
Because it is an image, it is significantly slower than HTML to download
- as a result we recommend that long documents be broken into bite-sized
chunks - and cannot be automatically generated from a database in the
same way as XML.
However, it is attractive to many
publishers, particularly if used as an adjunct to an SGML version. It is
now possible to include hypertext links within PDFs and for search
engines to index constituent text. PDF is not an appropriate
tool for all markets and the devices used by many visually disabled
people unfortunately will not translate PDF documents into speech.
PDF Reference: Adobe Portable Document Format Version
1.3 (Boston, Addison-Wesley 00) is the definitive
guide to PDF. Apart from Adobe's site, we recommend the
independent PlanetPDF.
TEI
The Text Encoding Initiative
(TEI) is an international project to develop guidelines for the
encoding of textual material in electronic form for research purposes.
The TEI Guidelines are online.
next page (academic
e-publishing)
|