Getting Started Documentation Glish Learn More Programming Contact Us
Version 1.9 Build 000 Latest News

FAQ

Search

Home


HTML Language and Resource Guide

Alan H. Bridle
National Radio Astronomy Observatory
520 Edgemont Road
Charlottesville, VA 22903-2475

HTML 3.2 last updated 21 June 1996, 14:44 EDT
Master URL: http://aips2.nrao.edu/aips++/docs/html/htm4aips.html

Purpose

This document summarizes information about, and provides links to, HTML language standards, manuals, style guides, and other tools available via the World-Wide-Web. It is oriented to the needs of scientific documentation at the NRAO, but may also interest anyone exploring the use of HTML for other purposes.


Contents

  1. Why hypertext?
  2. What is HTML?
  3. Generating HTML
  4. Converting HTML to other formats

1. Why hypertext?

Hypertext is attractive for documentation systems that have a wide variety of users, whether at multiple institutes (via the Internet) or within one organization (via an Intranet). Hypertext lets users explore documentation along individualized paths, navigating it in ways that match their interests or their level of understanding. Using HTML and the World-Wide-Web, hypertext also provides rapid publication and updating of information to users around the world.

Hypertext browsers can also serve information on-screen in ways that can be coupled to software, allowing interaction with user inputs. Tutorial documents can be integrated with on-line "applets", or with form-based interfaces to larger software packages. This allows text-based tutorials to be integrated with short "multimedia" demonstrations, access to databases, even fully-developed user interfaces to complex software packages.


2. What is HTML?

HTML is the Hypertext Markup Language, a method originated at CERN for formatting and linking documents, images and other information using tags enclosed in <angle brackets>. When allied with graphical browsers for displaying it on a wide range of computers, HTML became the basis for the rapid growth of the World-Wide-Web. This growth also saw the proliferation of browser-specific dialects of HTML; many of these provided "extensions" that are attractive to commercial Web sites, but are not standardized. The existence of these HTML dialects has encouraged some documenters to develop for particular WWW browsers, reducing inter-operability (but sometimes gaining commercial advantage).

While the Balkanization of HTML has been an attractive strategy for some commercial ventures, inter-operability based on global standards is usually more important in the long run for technical documentation. Developing for the browser du jour is rarely a good strategy relative to maximizing the portability of technical documentation at a multi-user facility like an observatory. What, then, are the long-term HTML Standards?

HTML Levels and Standards

HTML Level 1

http://www.w3.org/pub/WWW/MarkUp/HTML.html documents the Level 1 specification. This initial level did not support forms (data entry), scalable tables, formulae or scientific symbols. It was written by Tim Berners-Lee while at CERN and Dan Connolly while at Convex Computer Corp.

HTML Level 2

This level included forms for user input, but does not support scalable tables, formulae, or scientific symbols.

http://www.w3.org/pub/WWW/MarkUp/html-spec/html-spec_toc.html is the Level 2 draft specification. It is now under final review by the HTML Working Group of the Internet Engineering Task Force (IETF).

Many browsers, including NCSA Mosaic and Netscape Navigator, use features of HTML Level 2. Some also use proprietary features that are not part of this standard. Netscape Navigator supports ad hoc extensions to HTML 2.0 which are known informally as NHTML. Although their authors "expect" some of these to be part of future standards, other browsers may ignore them. To maximize portability, avoid using such browser-specific HTML features that are not part of the Level 2 draft standard.

Unfortunately, HTML Level 2 cannot directly display characters outside the ISO 8879-1986 Latin character set, nor can it format resizable Tables. Its character set contains a few non-Latin entities that are used by scientists and engineers, but not all browsers interpret them all correctly. Until its math standard is agreed on and implemented (at some future stage of development of Level 3), HTML is therefore ill-adapted for some scientific writing.

HTML Level 3

The Level 3.0 draft included such norms of technical documentation as resizable tables, captioned figures and mathematical equations. It also allowed more flexible layout control (e.g. text flowing around figures), and supported links to common multimedia formats such as sound sequences and MPEG movies. Level 3.0, sometimes called HTML+, was never widely deployed in browsers, however.

Some parts of the proposed standard were supported by Netscape Navigator (1.1 and higher) and by NCSA Mosaic, but Netscape in particular adopted its own extensions to the Level 3.0 specification.

http://webreference.com/html3andns/ has a particularly clear discussion of the differences between HTML 3.0 and NHTML: "HTML 3.0 and Netscape 3.0: How to tame the wild Mozilla".

In May 1996, the World Wide Web Consortium (W3C) at MIT, in consultation with vendors including IBM, Microsoft, Netscape Communications, Novell, SoftQuad, Spyglass and Sun Microsystems, announced a new HTML 3.2 specification. This specification, code named Wilbur, adds such already-deployed features as tables, Java applets and text flow around images while providing backward compatibility with Level 2. It will also provide extensions for multimedia objects, scripting, style sheets, improved layout, higher quality printing and math. There is some hope that the Level 3.2 specification, having been developed with more input from the commercial browser-writers, has a better chance of deployment than the ill-fated Level 3.0.

Microsoft's Internet Explorer Version 3.0 uses a variant of the HTML 3.2 specification.

W3C's demonstration browser for Level 3 is called Arena. It is available for Linux, Solaris, SunOS, Dec and SGI systems. Although still somewhat buggy, even using its own demonstration files(!), Arena illustrates future possibilities for documentation using HTML 3. A demonstration of Arena's capabilities (as screen dumps viewable on other browsers) is available at http://www.csd.uwo.ca/~tzoq/HTML3/.

There is an archive of discussion on the IETF HTML Working Group's E-mailing list at http://www.acl.lanl.gov/HTML_WG/archives.html.

HTML Manuals, Tutorials and HTML-oriented Web Sites

The WWW has many useful resources relevant to HTML. Good language manuals and tutorials are available at:

For HTML Style Guides, you might consult:


3. Generating HTML

HTML from TeX or LaTeX

Most existing documents on concepts, algorithms, instruments and mathematical methods that are relevant to astronomers are written in TeX or LaTeX. Because LaTeX-based packages are used to prepare and submit scientific articles to journals, Most scientists who will contribute to astronomical documentation systems are also more familiar with LaTeX than with HTML. Finally, TeX and LaTeX are well suited to the mastering and printing of large documents such as manuals, lecture notes and conference proceedings.

We can therefore expect to ingest much TeX/LaTeX material when constructing astronomical documentation in HTML. This process must be automated at least to the point where only minor hand-work is needed to bring scientific text and graphics into the documentation system.

We will probably need several approaches for this for most of the 1990's.

The first approach is straightforward but does not integrate incoming documents fully into a hypertext system. The second lacks a robust implementation, and may continue to do so until the Level 3.0 standard has been around for a while. Its most plausible vehicle at the moment is LaTeX2HTML (see http://cbl.leeds.ac.uk/nikos/tex2html/doc/latex2html/latex2html.html) by Nikos Drakos of Leeds University. Tests of LaTeX2HTML on LaTeX files from the NRAO astronomy documentation have revealed detailed problems with this converter, however. It is unable to parse some perfectly correct TeX constructs and can produce incorrect results, including symbol substitutions and garbled equations. Even if the translator was bug-free, some problems of principle would remain from this method's use of "transparent" images (GIF 89 format) to represent symbols and equations in the original. This approach has several disadvantages:

Eventually, the HTML standard will allow mathematical and Greek symbols to be incorporated directly as HTML elements. Our technical documentation should use the standard as soon as it is settled and is available in a competent, low-cost browser. Until this happy state of affairs is realized, LaTeX-to-HTML conversion of scientific documents should probably be limited to those that demand two-way links (to and from) the rest of a documentation system.

From scratch

HTML files can be generated using any editor that emits ASCII files. There is no reason in principle not to write them in emacs, Word Perfect or Microsoft Word, once the author either:

A list of filters for converting word-processor formats to HTML is kept at http://www.w3.org/pub/WWW/Tools/Word_proc_filters.html.

Several specialized tools that can simplify writing HTML from scratch are worth attention, however:

An extensive List of HTML Editors for all platforms can be found at http://union.ncsa.uiuc.edu/HyperNews/get/www/html/editors.html.

Checking HTML

As noted above, HTML files are plain text files that can be generated by any editor. HTML checkers make sure that all tags in HTML files are placed and nested meaningfully, and that the files contain all the required information. Simply reading an HTML file into someone`s favorite browser is not a good way to check it for validity, and certainly does not imply portability! The fluid nature of the standards beyond Level 2, and the enthusiasm of browser-writers for extending them or ignoring them, means that what looks valid or beautiful to one browser need not be conforming HTML that will display sensibly on another.

The HTML standard also contains features that are not required and are disregarded by many of the currently popular browsers but which will be turned on by more advanced browsers and indexing systems as the worldwide use of HTML matures. For example, the <HEAD> and <BODY> tags are not required by most browsers but are worth including as they can be used to speed up document-indexing systems. The use of an <HTML> container tag for the entire document is optional but may be used by future browsers to positively identify the file as one that is to be interpreted using the HTML Document Type Definition (DTD).


4. Converting HTML to other formats

The advantages of hypertext and browser-based documentation do not eliminate the need for printed documents. Not all readers prefer navigating the multi-linear structures of hypertext to the logical sequencing implicit in a printed manual. Not all browsing of documentation is done at a computer workstation. Documenters must still consider how to produce traditional printed manuals from any hypertext systems. Originating the documents as TeX or LaTeX or in a word-processor such as Microsoft Word or Word Perfect alleviates this problem, but there are also some options for documents that originate as HTML, or whose hypertext versions may have evolved away from a printable original.

To ASCII text

To Postscript

There are several choices for dumping HTML files to PostScript for printing as pages or chapters of manuals:


Copyright © 1996,1999,2000 Associated Universities Inc., Washington, D.C.


abridle@nrao.edu