Version 1.9 Build 1556

News	FAQ
Search	Home

Next: Document Type Definition Up: AIPS++ Documentation System (rev 2) DRAFT Version Previous: Introduction

Subsections

Document Generation

The conflicts and complications between the language in which the source code is expressed and the language in which the comments are expressed must somehow be resolved. The information contained in these two languages must be used to generate a reference document. These issues are discussed in this section along with the tags which are used to structure the comments.

Mixing Languages

The introduction of a structuring language for comments implies a mixing at some level of the language being documented and the language in which the documentation is expressed. This mixing can happen in one of three ways:

1.: Provide a system which can integrate the languages at a level higher than the file level. In this way, the system would provide mechanism to edit code where the code would be brought up in the programmers favorite editor, but the documentation would be brought up in a SGML editor. This gives the best of both worlds. A text editor is used for code, but a more powerful WYSIWYG editor can be used to construct the documentation.
2.: Provide a system where the code is described as part of the documentation system. In this system, the code is extracted from the documentation for compilation purposes. This is the literate programming environment envisioned by Knuth.
3.: Provide a system where the structure of the documentation is described inside of comments in the source code, and then the whole thing is massaged into a useful shape by a processor which turns the code/comment combination into a nice document.

Of these three alternatives I think the first is the cleanest, and it could be the most programmer friendly. However, it also involves the most work, especially if it is to be truly programmer friendly. It involves much more work than the latter two options. In addition, it would require a great deal of trial and error to arrive at a system which was flexible enough to be used for serious development. The second choice seems nice in theory, but I fear that this ``literary'' nature of development is quite foreign to many developers. The extra processing of turning documentation into code could also add too much extra time in the edit-compile-debug loop for it to be useful in practice. Thus, the third alternative was chosen. It has the advantage that the source is always kept in a state which can be compiled.

Reference documents will be assembled out of the documentation in the comments and the information extracted directly from the source code. This mixing of source code objects and comment objects in the same place requires a processor to mesh the two languages and rearrange the derived SGML objects into a coherent reference document.

Language and Comment Elements

The code units in a source file will be called ``language elements''. This includes for example a class definition, a function definition, a definition of one or more variables, a class declaration, a function declaration, etc. Basically, a language element is any complete statement in the grammar for the language. These are the pieces of the language which can have comment elements associated with them. Likewise, the comment elements are the comments which must precede the language element with which they are associated. So, for example, the following code fragment shows a function definition along with its associated comment.

// <div>
// <p> This function computes the area of a triangle and returns the
// result.
// </div>
float triangle::area() \{
  return (0.5 * base() * height());
\};

In this example, the code element is the definition of the function triangle::area(). The comment element preceding the definition is associated with the function. This pair will be converted into one piece of the documentation generated by the documentation extractor.

While comment elements can be provided for any C++ language elements or preprocessor elements, all comments will not be extracted. Initially, comments will only be extracted for functions and classes. Later elements like ``#define''s, structs, enums, etc. will be added.

SGML Tags for Documentation

This section discusses the tags which were developed for describing the logical structure of elements of comments. These are described as an SGML specification in appendix A.

DTD Elements for Source Code Documentation

Title - title

Many documentation elements contain an optional <title> as their first component. This allows for the specification of a label for that section of the overall document. So for example, one might have:

<div> <title> A simple section </title>
<p>The contents of the section.
</div>

The elements which can have titles are <div>, <warn>, <note>, <verbatim>, <literal>, <code>, <enum>, and <list>. These are explained below.

Text Separators - div, p

There are two basic levels of division where ``plain'' text is concerned. These divisions are at the paragraph and section level.

To divide paragraphs the <p> tags should be used. These tags mark off a section of text indicating that it makes up a paragraph. The paragraph contains parsable character data, ``#PCDATA''. This means that the data in the paragraph can be fully processed by SGML for the expansion of entities and elements. Both starting and ending paragraph tags are optional in the cases where no ambiguity exists.

As a short-cut, paragraph beginnings and ends can be implied by blank lines. So for example, the following comment:

//
// This function computes the volume of a cylinder
//
// Its OK
//
float Cylinder::volume() {
  return C::pi * radius() * radius() * height();
}

would be converted into:

<div>
<p>
This function computes the volume of a cylinder
</p><p>
Its OK
</p>
</div>

The <div>s are added automatically by the documentation extractor (discussed below). and the paragraph elements, <p>, are implicit. Note that the leading and trailing blank lines are significant. They are the short-hand notation to start the initial paragraph and end the last paragraph.

The <div> tag is the generic tag for sections within a document. These sections are designed to be referentially transparent and readily relocatable. Typically comment elements will be divisions. Both the starting and ending tags are required for divisions. To obtain subsections, <div>s can be nested. For example,

<div>
  <div> <title> Nested Division </title>
     <p> This is the first simple paragraph.
     <p> This is the second simple paragraph.
   </div>
</div>

In this case, the closing paragraph tags can be left off because a paragraph cannot contain another paragraph. Typically the closing paragraph tag can be left off.

Warnings and Notations - warn, note

These sections of text are for the display of information which should be important to the user. In the case of <note>, this should be used to bring a portion of text to the attention of the user. The <warn> text should tell the user information which can result in damage (of some sort) if the information is not known. For example,

<warn> ... deletion of this object will result in a memory leak under
       these circumstances.
<note> The new operator is private to prevent dynamic creation of
       objects of this class.

Literal Sections - verbatim, literal, code

Literal sections allow one to create sections which undergo very little modification in the process of generating output. There is one subtle distinction between these sections.

In the case of <verbatim> and <code>, very little processing is done. The characters between the starting and ending tags for these elements are simply treated as a character string; no elements are processed and no entities are expanded. The one difference between these two is that <code> should be used to display sections of source code, otherwise <verbatim> should be used. These are to be used when no expansion of entities is required.

In the case of <literal>, minimal processing is also done on the characters in a <literal> section. In this case, however, the entities are expanded correctly.

Lists - enum, list

All of the lists that are used have a common tag for the elements of the list, <item>. The contents of this tag can only be parsable character data and the list items do not nest. So for the most part, the closing <item> tag can typically be omitted.

There are two ``generic'' list choices. The first type of generic list is a numbered list, <enum>. The second generic list type is a list which distinguishes the elements of the list with bullets, <list>. Both of these lists can have a <title> before any of the elements of the list. So these might look like:

<list>
  <item> First Element
  <item> Second Element
</list>
<enum> <title> An empty list </title>
  <item> first element
</enum>

Hyperlinks

To be added later (probably based on HTML or HyTime) ...

Descriptors

There are several elements which are devoted to describing details about a given class or function. These descriptors list things like the exceptions thrown, the I/O devices accessed, etc.

Class Descriptors

The <descriptor> element list several important aspects about a given class. A given <descriptor> might look like:

<descriptor>
  <execution> <sequential>
  <bounded>
  <memory> <counted>
  <iterator>
  <cached>
</descriptor>

So, this description would describe a class that is purely sequential, has a maximum object size, is reference counted, has an iterator class, and has a builtin caching mechanism. If <bounded> were not specified, the assumption is that the object size is unbounded.

This <descriptor> list corresponds to the object descriptors required in the AIPS++ coding standards. The options are as follows:

<execution> can contain <sequential>, <guarded>, <concurrent>, or <multiple>.
If <bounded> is specified then the object size growth is bounded, otherwise it is unbounded.
<memory> can contain <counted>, <gc>, or <unmanaged>
If <iterator> is specified, then the object has an iterator, otherwise it does not.
If <persistent> is specified, then the object is persistent, otherwise it is not.
If <cached> is specified then the object has cached management, otherwise it does not.

Using this list of attributes a great deal of information about the class can be expressed succinctly.

Device I/O

Often, it is useful to have the ability to track functions which depend on particular files or I/O devices. This will often target which functions have direct access to the operating system. So the following would label a function which performed I/O on the files /etc/hosts and /etc/motd using OS specific routines, e.g. ``<stdio.h>'', ``<iostream.h>'':

<iodev> <level><os>
   <item> /etc/hosts
   <item> /etc/motd
</iodev>

It is also useful to be able to target functions which access files using a particular abstraction, e.g. AipsIO, because although these files do not depend directly on the operating system function calls they are tied to particular files. So for example a function that performs operations on the file /usr/local/var/aipsppdb using AipsIO would be labeled:

<iodev> <level> AipsIO
   <item> /usr/local/var/aipsppdb
</iodev>

In this way, the dependencies between the AIPS++ and the environment in which it operates can be tracked.

Exceptional Conditions

It is important to know the exceptions which can be thrown by a given class or function. This information gives the user of a function the list of exceptions which he may be required to catch. This information can be presented as follows:

<thrown>
  <item> AllocError
  <item> ArrayNDimError
</thrown>

These thrown exceptions should be specified at the member functions which throw the exceptions. However, the <thrown> descriptor could also be used in the comment element for a class. At some time in the future, the extractor will hopefully be able to generate call trees, and thus pick up a complete list of all of the exceptions thrown by a given function automatically. However, this ability depends on having a fully functional C++ parser available.

Extractor Commands

There are extractor commands which provide the user with control over how and if the comments are extracted. All of these commands have the form //* where the ``//'' is the beginning of a C++ comment, and the ``*'' is one or more consecutive non-space command characters.

Non-processable Comment

Often one will want to specify a comment which only belongs in the source file, and should not be extracted. This can be accomplished as follows:

//\#   This comment will not be extracted
//\#Neither will this one

Group Command

It is useful to provide a comment and specify that this comment applies to a group of language elements. This prevents the comment inconsistencies which can result from duplication of comments. This can be achieved as follows:

// Generally use of this should be shunned, except to use a FORTRAN 
// routine or something similar. Because you can't know the state of 
// the underlying implementation.
//+grp
T *getStorage(Bool \&deleteIt);
const T *getStorage(Bool \&deleteIt) const;
void putStorage(T *storage, Bool deleteAndCopy);

// <warn> An added problem with freeStorage is that it ...
void freeStorage(const T *storage, Bool deleteIt) const;
//-grp

The ``//+grp'' is the start group command. This command tells the extractor that the comment it just encountered should be used as the ``base'' comment for languages elements until a end group command, //-grp, is encountered. Any comments that are introduced within the group will be added to the end of the base comment for the appropriate language element(s). Group commands can be nested as long as the start and end group commands are balanced.

Literal Command

All comment elements, the comment preceding a language element, are typically a division, <div>. However, inside source files this results in a great deal of clutter in comments which would otherwise be quite readable. As a result, the extractor will attempt to correctly add the <div> tags. The user, however, can prevent the extractor for adding these by using the literal extractor command. The extractor will not try to spruce up documentation between the start literal command, ``//+lit'', and the end literal command, ``//-lit''. The start literal command should occur at the beginning of the comment block, and the end literal command should occur at the end. The literal command does not extend over multiple comment elements. The following example demonstrates the use of these commands.

//+lit
// <category lib=aips sect=io>
// <div> <title> Problems with this implementation </title>
//   The body of the section.
// </div>
//-lit

These commands prevent the extractor from adding any user level SGML commands. Typically all that the extractor would add is <div>s around a comment element. This allows the user to specify:

//
// This computes the area of the circle.
//
float circle::area() {
  return(C::pi * radius() * radius());}

instead of:

// <div>
// This computes the area of the circle.
// </div>
float circle::area() {
  return(C::pi * radius() * radius());}

Next: Document Type Definition Up: AIPS++ Documentation System (rev 2) DRAFT Version Previous: Introduction