doctools::toc::parse - Parsing text in doctoc format
package require doctools::toc::parse ?0.1?
package require Tcl 8.4
package require doctools::toc::structure
package require doctools::msgcat
package require doctools::tcl::parse
package require fileutil
package require logger
package require snit
package require struct::list
package require struct::stack
::doctools::toc::parse text text
::doctools::toc::parse file path
::doctools::toc::parse includes
::doctools::toc::parse include add path
::doctools::toc::parse include remove
path
::doctools::toc::parse include clear
::doctools::toc::parse vars
::doctools::toc::parse var set name
value
::doctools::toc::parse var unset name
::doctools::toc::parse var clear
?pattern?
This package provides commands to parse text written in the
doctoc markup language and convert it into the canonical
serialization of the table of contents encoded in the text. See the section
ToC serialization format for specification of their format.
This is an internal package of doctools, for use by the higher
level packages handling doctoc documents.
- ::doctools::toc::parse text text
- The command takes the string contained in text and parses it under
the assumption that it contains a document written using the doctoc
markup language. An error is thrown if this assumption is found to be
false. The format of these errors is described in section Parse
errors.
When successful the command returns the canonical
serialization of the table of contents which was encoded in the text.
See the section ToC serialization format for specification of
that format.
- ::doctools::toc::parse file path
- The same as text, except that the text to parse is read from the
file specified by path.
- ::doctools::toc::parse includes
- This method returns the current list of search paths used when looking for
include files.
- ::doctools::toc::parse include add path
- This method adds the path to the list of paths searched when
looking for an include file. The call is ignored if the path is already in
the list of paths. The method returns the empty string as its result.
- ::doctools::toc::parse include remove path
- This method removes the path from the list of paths searched when
looking for an include file. The call is ignored if the path is not
contained in the list of paths. The method returns the empty string as its
result.
- ::doctools::toc::parse include clear
- This method clears the list of search paths for include files.
- ::doctools::toc::parse vars
- This method returns a dictionary containing the current set of predefined
variables known to the vset markup command during processing.
- ::doctools::toc::parse var set name value
- This method adds the variable name to the set of predefined
variables known to the vset markup command during processing, and
gives it the specified value. The method returns the empty string
as its result.
- ::doctools::toc::parse var unset name
- This method removes the variable name from the set of predefined
variables known to the vset markup command during processing. The
method returns the empty string as its result.
- ::doctools::toc::parse var clear ?pattern?
- This method removes all variables matching the pattern from the set
of predefined variables known to the vset markup command during
processing. The method returns the empty string as its result.
The pattern matching is done with string match, and the
default pattern used when none is specified, is *.
The format of the parse error messages thrown when encountering
violations of the doctoc markup syntax is human readable and not
intended for processing by machines. As such it is not documented.
However, the errorCode attached to the message is
machine-readable and has the following format:
- [1]
- The error code will be a list, each element describing a single error
found in the input. The list has at least one element, possibly more.
- [2]
- Each error element will be a list containing six strings describing an
error in detail. The strings will be
- [1]
- The path of the file the error occured in. This may be empty.
- [2]
- The range of the token the error was found at. This range is a two-element
list containing the offset of the first and last character in the range,
counted from the beginning of the input (file). Offsets are counted from
zero.
- [3]
- The line the first character after the error is on. Lines are counted from
one.
- [4]
- The column the first character after the error is at. Columns are counted
from zero.
- [5]
- The message code of the error. This value can be used as argument to
msgcat::mc to obtain a localized error message, assuming that the
application had a suitable call of doctools::msgcat::init to
initialize the necessary message catalogs (See package
doctools::msgcat).
- [6]
- A list of details for the error, like the markup command involved. In the
case of message code doctoc/include/syntax this value is the set of
errors found in the included file, using the format described here.
The doctoc format for tables of contents, also called the
doctoc markup language, is too large to be covered in single section.
The interested reader should start with the document
- [1]
- doctoc language introduction
and then proceed from there to the formal specifications, i.e. the
documents
- [1]
- doctoc language syntax and
- [2]
- doctoc language command reference.
to get a thorough understanding of the language.
Here we specify the format used by the doctools v2 packages to
serialize tables of contents as immutable values for transport, comparison,
etc.
We distinguish between regular and canonical
serializations. While a table of contents may have more than one regular
serialization only exactly one of them will be canonical.
- regular
serialization
- [1]
- The serialization of any table of contents is a nested Tcl
dictionary.
- [2]
- This dictionary holds a single key, doctools::toc, and its value.
This value holds the contents of the table of contents.
- [3]
- The contents of the table of contents are a Tcl dictionary holding the
title of the table of contents, a label, and its elements. The relevant
keys and their values are
- title
- The value is a string containing the title of the table of contents.
- label
- The value is a string containing a label for the table of contents.
- items
- The value is a Tcl list holding the elements of the table, in the order
they are to be shown.
Each element is a Tcl list holding the type of the item, and
its description, in this order. An alternative description would be that
it is a Tcl dictionary holding a single key, the item type, mapped to
the item description.
The two legal item types and their descriptions are
- reference
- This item describes a single entry in the table of contents, referencing a
single document. To this end its value is a Tcl dictionary containing an
id for the referenced document, a label, and a longer textual description
which can be associated with the entry. The relevant keys and their values
are
- id
- The value is a string containing the id of the document associated with
the entry.
- label
- The value is a string containing a label for this entry. This string also
identifies the entry, and no two entries (references and divisions) in the
containing list are allowed to have the same label.
- desc
- The value is a string containing a longer description for this entry.
- division
- This item describes a group of entries in the table of contents, inducing
a hierarchy of entries. To this end its value is a Tcl dictionary
containing a label for the group, an optional id to a document for the
whole group, and the list of entries in the group. The relevant keys and
their values are
- id
- The value is a string containing the id of the document associated with
the whole group. This key is optional.
- label
- The value is a string containing a label for the group. This string also
identifies the entry, and no two entries (references and divisions) in the
containing list are allowed to have the same label.
- items
- The value is a Tcl list holding the elements of the group, in the order
they are to be shown. This list has the same structure as the value for
the keyword items used to describe the whole table of contents, see
above. This closes the recusrive definition of the structure, with
divisions holding the same type of elements as the whole table of
contents, including other divisions.
- canonical
serialization
- The canonical serialization of a table of contents has the format as
specified in the previous item, and then additionally satisfies the
constraints below, which make it unique among all the possible
serializations of this table of contents.
- [1]
- The keys found in all the nested Tcl dictionaries are sorted in ascending
dictionary order, as generated by Tcl's builtin command lsort
-increasing -dict.
This document, and the package it describes, will undoubtedly
contain bugs and other problems. Please report such in the category
doctools of the Tcllib SF Trackers
[http://sourceforge.net/tracker/?group_id=12883]. Please also report any
ideas for enhancements you may have for either package and/or
documentation.
doctoc, doctools, lexer, parser
Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>