Version 1.9 Build 1556

News	FAQ
Search	Home

Next: Document Generation Up: AIPS++ Documentation System (rev 2) DRAFT Version Previous: AIPS++ Documentation System (rev 2) DRAFT Version

Subsections

Introduction

The objective of this system is to provide a way for programmers to document source code in a way which allows documentation to be extracted and converted into a reference document. To accomplish this, it is necessary to provide a way for the programmer to specify information which supplements the regular comments. This ``extra'' information is know as markup. The markup provides a means of structuring the comments, and allows them to be formatted in ways more sophisticated than ASCII text.

This system allows both the source code and the documentation for the source to exist in the same file. This close proximity will increase the chances that the reference document will remain up to date. In addition, the goal is to have a parser which understands most of the C++ language. This allows much of the reference document to be generated automatically from the source code.

The first revision of this system used a language similar to the roff dialects. This language was abandoned in favor of SGML, Standard Generalized Markup Language. There are a few reasons for this:

Since SGML is an ISO/ANSI standard many tools are likely to be available in the future to support it.
Generalized markup is a better model for information reuse and flexibility. It allows the information it structures to be used by a variety of applications.
SGML provides a good basis for future expansion.
Many successful applications already use SGML, and the work they and the various standardization groups have done can be leveraged to our benefit.

Generalized Markup

The motivation for generalized markup is a departure from that of traditional markup languages. With traditional markup languages, the goal is to add additional information to a document to allow it to be formatted for presentation. The markup is formatting information. With generalized markup, however, the markup is used convey the logical structure of the information. How this structure is formatted should not be the concern of the writer. The provider of the information should only be concerned with providing the important information with the appropriate logical structure. The actual formatting will be performed by systems which will use this information. The logical structure of the information is much more valuable than the way it should be formatted. If the information is structured correctly, a variety of post-processors should be able to use the information which is based on a given tag-set.

So for example a piece of procedural markup might look like:

.TH DF 1V "16 September 1989"
.SH NAME
df \- report free disk space on file systems
.SH SYNOPSIS
.B df

This is a section taken from the df Unix man page. Here, the .TH command sets up the header - the reference page is DF, the section is 1V, and the last field is the date of the most recent change. The .SH command sets up a section heading with the given label, and the .B command displays text in bold. This is relatively typical of procedural markup. While all of the information necessary to present a nicely formatted man page is there, it is all superficial. The sections do not convey the content. The section command is just a section command, and does not convey the fact that the section is the synopsis. In a generalized markup language, SGML in this case, this section of the man page might be represented as:

<man ref=DF sect='1V' date="16 September 1989">
<name> df 
<summary> report free disk space on file systems </summary> 
<synopsis> <com>df</com>

Here the document is labeled as a man page and the <man> attributes ¹ are the information that previously generated the header. The name is no longer the text of a section heading; it is labeled as a <name>. The synopsis is labeled as a synopsis, <synopsis>, instead of a section with the heading ``SYNOPSIS''. The ``df'' reference in the synopsis is a command, <com>, instead of a bold section of arbitrary text.

Generalized markup allows the important information to be processed unambiguously by many different systems. With generalized markup, a <synopsis> may be treated as a section for printed text, but for a hypertext system the synopsis may only be displayed if someone presses the synopsis button (however that is done). The information represents a synopsis instead of formatting details. The formatting decisions are left to the discretion of the system which will use the information.

Generalized markup is one tool which will allow information to be reused for a variety of purposes. Once freed from a particular formatting language, many applications can utilize the same information source for many purposes. The same logically structured information can be used to generate a printed manual, a hypertext system, a database, or practically anything else that can be done with information. All that is necessary is that the proper set of tags are defined and processors exist to manipulate information structured with the tags.

Standard Generalized Markup Language

SGML is in some sense a meta-language. It allows one to specify one of a family of generalized markup languages. SGML provides the language to specify the elements of the markup language and the rules for composition of these elements. This capability enables mechanical analysis of documents which utilize an SGML specified markup language. In addition, since SGML is an ANSI/ISO standard compatibility of SGML systems is guaranteed. In fact, SGML has been used for representing brail, music, hypermedia, information for tutoring systems, published books. SGML will be one of the tools which will help to make free interchange of information a reality.

Document Type Definition

One of the most important sections of an SGML document is the section which defines the tag-set and composition rules which will be used in the document. This ``grammar'' is typically located in another file and is included at the top of the document in much the same way that L^ATEXmacro packages would be included in L^ATEX documents. This ``grammar'' section is called the Document Type Definition, DTD.

Elements

The DTD specifies the elements, the tags which delimit elements, and the rules for composition of elements. An element is the smallest unit of concern for SGML. It represents a single concept or logical unit. So for example, a paragraph might be one element in the DTD, and one paragraph element might be delimited:

<p> This is my paragraph. </p>

This forms one instance of the paragraph element. The starting, <p>, and ending, </p>, tags delimit the element. The ``p'' in the tags is called a generic identifier, and it serves to distinguish paragraph elements from other sorts of elements.

Generic identifiers, GIs, can have attributes. These attributes describe important characteristics of the GI. The attributes are specified within the starting tag of the element. For example:

<category lib=aips sect="math">

Here the attributes are lib and sect. The value for lib is ``aips'' and the value for sect is ``math''. Either single quotes, `` ' '' or double quotes, `` " '', may be used to delimit literal attribute values. If the attribute value is one word, no quotes are necessary. The attribute name is typically limited to eight characters.

Entities

In addition to tags, SGML also allow for the definition of entities. Entities provide a means for simple keyword expansion. This is useful not only for a shorthand notation, but also as a way of parameterizing a document to make future changes easy. Entity references begin with an ampersand and end with a semicolon. So for example, a reference entity for the less than symbol, ``<'', might be <. All entities are typically defined in the DTD.

Entities are particularly useful to prevent text from being interpreted as markup. The characters which are of particular concern are ``<'' and ``&'' because the introduce element references and entities references respectively. However, this should typically not be a problem because misinterpretation is only possible when these opening characters are followed directly by a non-space character. So ``<test'' could be interpreted as a tag but ``<@: test'' could not. Other characters which could cause problems are ``>'' and ``;'' because these are the characters which end element references and entity references respectively.

Body

Once the DTD has been specified, the user can then structure the information in the body of the SGML document. Once the tags are specified they can be used like the commands in other markup languages. If the elements are composed incorrectly, a parser will point out the problems. This is the level at which most SGML users will operate. This is the level at which developers entering comments will operate. For the developer entering comments, there will not be a ``body'' as such. The comments will each be a portion of the ``body'' of the SGML document. These pieces will be ordered and assembled into the body of the document by an extractor.

Next: Document Generation Up: AIPS++ Documentation System (rev 2) DRAFT Version Previous: AIPS++ Documentation System (rev 2) DRAFT Version