treeql - Query tree objects
package require Tcl 8.2
package require snit
package require struct::list
package require struct::set
package require treeql ?1.3.1?
treeql objectname -tree tree
?-query query? ?-nodes nodes?
?args...?
qo query args...
qo result
qo discard
This package provides objects which can be used to query and
transform tree objects following the API of tree objects created by the
package struct::tree.
The tree query and manipulation language used here, TreeQL, is
inspired by Cost (See section References for more information).
treeql, the package, is a fairly thin query facility over
tree-structured data types. It implements an ordered set of nodes (really a
list) which are generated and filtered through the application of TreeQL
operators to each node in turn.
The command treeql is a snit::type which implements
the Treeql Query Language. This means that it follows the API for class
commands as specified by the package snit. Its general syntax is
- treeql
objectname -tree tree ?-query query?
?-nodes nodes? ?args...?
- The command creates a new tree query object and returns the fully
qualified name of the object command as its result. The API the returned
command is following is described in the section TreeQL OBJECT API
Each query object is associated with a single tree
object. This is the object all queries will be run against.
If the option -nodes was specified then its argument is
treated as a list of nodes. This list is used to initialize the node
set. It defaults to the empty list.
If the option -query was specified then its argument
will be interpreted as an object, the parent query of this query.
It defaults to the object itself. All queries will be interpreted in the
environment of this object.
Any arguments coming after the options are treated as a query
and run immediately, after the node set has been initialized.
This uses the same syntax for the query as the method query.
The operations of the TreeQL available for this are explained
in the section about The Tree Query Language. This section also
explains the term node set used above.
As treeql has been implemented in snit all the
standard methods of snit-based classes are available to the user and
therefore not listed here. Please read the documentation for snit for
what they are and what functionality they provide
The methods provided by the package treeql itself are
listed and explained below.
- qo query
args...
- This method interprets its arguments as a series of TreeQL operators and
interpretes them from the left to right (i.e. first to last). Note that
the first operator uses the node set currently known to the object
to perform its actions. In other words, the node set is not
cleared, or modified in other ways, before the query is run. This allows
the user to run several queries one after the other and have each use the
results of the last. Any initialization has to be done by any query
itself, using TreeQL operators. The result of the method is the node
set after the last operator of the query has been executed.
Note that uncaught errors will leave the node
set of the object in an intermediate state, per the TreeQL operators
which were executed successfully before the error occurred.
The above means in detail that:
- [1]
- The first argument is interpreted as the name of a query operator, the
number of arguments required by that operator is then determined, and
taken from the immediately following arguments.
Because of this operators cannot have optional arguments, all
arguments have to be present as defined. Failure to do this will, at
least, confuse the query interpreter, but more likely cause errors.
- [2]
- The operator is applied to the current node set, yielding a new node set,
and/or manipulating the tree object the query object is connected to.
- [3]
- The arguments used (i.e. operator name and arguments) are removed from the
list of method arguments, and then the whole process is repeated from step
[1], until the list of arguments is empty or an error occurred.
# q is the query object.
q query root children get data
# The above query
# - Resets the node set to the root node - root
# - Adds the children of root to the set - children
# - Replaces the node set with the - get data
# values for the attribute 'data',
# for all nodes in the set which
# have such an attribute.
# - And returns this information.
# Below we can see the same query, but rewritten
# to show the structure as it is seen by the query
# interpreter.
q query \\
root \\
children \\
get data
The operators of the TreeQL language available for this are
explained in the section about The Tree Query Language. This section
also explains the term node set used above.
- qo
result
- This method returns a list containing the current node set.
- qo
discard
- This method returns the current node set (like method result), but
also destroys the query object (qo). This is useful when
constructing and using sub-queries (%AUTO% objects immediately destroyed
after use).
This and the following sections specify the Tree Query Language
used by the query objects of this package in detail.
First we explain the general concepts underneath the language
which are required to comprehend it. This is followed by the specifications
for all the available query operators. They fall into eight categories, and
each category has its own section.
- [1]
- TreeQL Concepts
- [2]
- Structural generators
- [3]
- Attribute Filters
- [4]
- Attribute Mutators
- [5]
- Attribute String Accessors
- [6]
- Sub-queries
- [7]
- Node Set Operators
- [8]
- Node Set Iterators
- [9]
- Typed node support
The main concept which has to be understood is that of the node
set. Each query object maintains exactly one such node set, and
essentially all operators use it and input argument and for their result.
This structure simply contains the handles of all nodes which are currently
of interest to the query object. To name it a set is a bit of a
misnomer, because
- [1]
- A node (handle) can occur in the structure more than once, and
- [2]
- the order of nodes in the structure is important as well. Whenever an
operator processes all nodes in the node set it will do so in the order
they occur in the structure.
Regarding the possible multiple occurrence of a node, consider a
node set containing two nodes A and B, both having node P as their immediate
parent. Application of the TreeQL operator "parent" will then add
P to the new node set twice, once per node it was parent of. I.e. the new
node set will then be {P P}.
All tree-structural operators locate nodes in the tree based on a
structural relation ship to the nodes currently in the set and then replace
the current node set with the set of nodes found Nodes which fulfill such a
relationship multiple times are added to the result as often as they fulfill
the relationship.
It is important to note that the found nodes are collected in a
separate storage area while processing the node set, and are added to (or
replacing) the current node set only after the current node set has been
processed completely. In other words, the new nodes are not processed
by the operator as well and do not affect the iteration.
When describing an operator the variable N will be used to
refer to any node in the node set.
- ancestors
- Replaces the current node set with the ancestors for all nodes N in
the node set, should N have a parent. In other words, nodes without
a parent do not contribute to the new node set. In other words, uses all
nodes on the path from node N to root, in this order (root last),
for all nodes N in the node set. This includes the root, but not
the node itself.
- rootpath
- Replaces the current node set with the ancestors for all nodes N in
the node set, should N have a parent. In other words, nodes without
a parent do not contribute to the new node set. In contrast to the
operator ancestors the nodes are added in reverse order however,
i.e. the root node first.
- parent
- Replaces the current node set with the parent of node N, for all
nodes N in the node set, should N have a parent. In other
words, nodes without a parent do not contribute to the new node set.
- children
- Replaces the current node set with the immediate children of node
N, for all nodes N in the node set, should N have
children. In other words, nodes without children do not contribute to the
new node set.
- left
- Replaces the current node set with the previous/left sibling for all nodes
N in the node set, should N have siblings to the left. In
other words, nodes without left siblings do not contribute to the new node
set.
- right
- Replaces the current node set with the next/right sibling for all nodes
N in the node set, should N have siblings to the right. In
other words, nodes without right siblings do not contribute to the new
node set.
- prev
- Replaces the current node set with all previous/left siblings of node
N, for all nodes N in the node set, should N have
siblings to the left. In other words, nodes without left siblings are
ignored. The left sibling adjacent to the node is added first, and the
leftmost sibling last (reverse tree order).
- esib
- Replaces the current node set with all previous/left siblings of node
N, for all nodes N in the node set, should N have
siblings to the left. In other words, nodes without left siblings are
ignored. The leftmost sibling is added first, and the left sibling
adjacent to the node last (tree order).
The method name is a shorthand for Earlier SIBling.
- next
- Replaces the current node set with all next/right siblings of node
N, for all nodes N in the node set, should N have
siblings to the right. In other words, nodes without right siblings do not
contribute to the new node set. The right sibling adjacent to the node is
added first, and the rightmost sibling last (tree order).
- root
- Replaces the current node set with a node set containing a single node,
the root of the tree.
- tree
- Replaces the current node set with a node set containing all nodes found
in the tree. The nodes are added in pre-order (parent first, then
children, the latter from left to right, first to last).
- descendants
- Replaces the current node set with the nodes in all subtrees rooted at
node N, for all nodes N in the node set, should N
have children. In other words, nodes without children do not contribute to
the new node set.
This is like the operator children, but covers the
children of children as well, i.e. all the proper descendants.
"Rooted at N" means that N itself is not added
to the new set, which is also implied by proper descendants.
- subtree
- Like operator descendants, but includes the node N. In other
words:
Replaces the current node set with the nodes of the subtree of
node N, for all nodes N in the node set, should N
have children. In other words, nodes without children do not contribute
to the new node set. I.e this is like the operator children, but
covers the children of children, etc. as well. "Of N"
means that N itself is added to the new set.
- forward
- Replaces the current node set with the nodes in the subtrees rooted at the
right siblings of node N, for all nodes N in the node set,
should N have right siblings, and they children. In other words,
nodes without right siblings, and them without children are ignored.
This is equivalent to the operator sequence
next descendants
- later
- This is an alias for the operator forward.
- backward
- Replaces the current node set with the nodes in the flattened previous
subtrees, in reverse tree order.
This is nearly equivalent to the operator sequence
prev descendants
The only difference is that this uses the nodes in reverse order.
- earlier
- Replaces the current node set with the nodes in the flattened previous
subtrees, in tree order.
This is equivalent to the operator sequence
prev subtree
These operators filter the node set by reference to attributes of
nodes and their properties. Filter means that all nodes not fulfilling the
criteria are removed from the node set. In other words, the node set is
replaced by the set of nodes fulfilling the filter criteria.
- hasatt
attr
- Reduces the node set to nodes which have an attribute named
attr.
- withatt
attr value
- Reduces the node set to nodes which have an attribute named attr,
and where the value of that attribute is equal to value (The
"==" operator is string equal -nocase).
- withatt!
attr val
- This is the same as withatt, but all nodes in the node set have to
have the attribute, and the "==" operator is string
equal, i.e. no -nocase. The operator will fail with an error if
they don't have the attribute.
- attof attr
vals
- Reduces the node set to nodes which which have an attribute named
attr and where the value of that attribute is contained in the list
vals of legal values. The contained-in operator used here does glob
matching (using the attribute value as pattern) and ignores the case of
the attribute value, but not for the elements of vals.
- attmatch
attr match
- Same as withatt, but string match is used as the
"==" operator, and match is the pattern checked for.
Note that match is a interpreted as a partial
argument list for string match. This means that it is
interpreted as a list containing the pattern, and the pattern element
can be preceded by options understand by string match, like
-nocase. This is especially important should the pattern contain
spaces. It has to be wrapped into a list for correct interpretation by
this operator
These operators change node attributes within the underlying tree.
In other words, all these operators have side effects.
- set attr
val
- Sets the attribute attr to the value val, for all nodes
N in the node set. The operator will fail if a node does not have
an attribute named attr. The tree will be left in a partially
modified state.
- unset
attr
- Unsets the attribute attr, for all nodes N in the node set.
The operator will fail if a node does not have an attribute named
attr. The tree will be left in a partially modified state.
These operators retrieve the values of node attributes from the
underlying tree. The collected results are stored in the node set, but are
not actually nodes.
In other words, they redefine the semantics of the node set stored
by the query object to contain non-node data after their completion.
The query interpreter will terminate after it has finished
processing one of these operators, silently discarding any later query
elements. It also means that our talk about maintenance of a node set is not
quite true. It is a node set while the interpreter is processing commands,
but can be left as an attribute value set at the end of query
processing.
- string op
attr
- Applies the string operator op to the attribute named attr,
for all nodes N in the node set, collects the results of that
application and places them into the node set.
The operator will fail if a node does not have an attribute
named attr.
The argument op is interpreted as partial argument list
for the builtin command string. Its first word has to be any of
the sub-commands understood by string. This has to be followed by
all arguments required for the subcommand, except the last. that last
argument is supplied by the attribute value.
- get
pattern
- For all nodes N in the node set it determines all their attributes
with names matching the glob pattern, then the values of these
attributes, at last it replaces the node set with the list of these
attribute values.
- attlist
- This is a convenience definition for the operator getvals *. In
other words, it replaces the node set with a list of the attribute values
for all attributes for all nodes N in the node set.
- attrs
glob
- Replaces the current node set with a list of attribute lists, one
attribute list per for all nodes N in the node set.
- attval
attname
- Reduces the current node set with the operator hasatt, and then
replaces it with a list containing the values of the attribute named
attname for all nodes N in the node set.
Sub-queries yield node sets which are then used to augment, reduce
or replace the current node set.
- andq
query
- Replaces the node set with the set-intersection of the node set generated
by the sub-query query and itself.
The execution of the sub-query uses the current node set as
its own initial node set.
- orq query
- Replaces the node set with the set-union of the node set generated by the
sub-query query and itself. Duplicate nodes are removed.
The execution of the sub-query uses the current node set as
its own initial node set.
- notq
query
- Replaces the node set with the set of nodes generated by the sub-query
query which are also not in the current node set. In other word the
set difference of itself and the node set generated by the sub-query.
The execution of the sub-query uses the current node set as
its own initial node set.
These operators change the node set directly, without referring to
the tree.
- unique
- Removes duplicate nodes from the node set, preserving order. In other
words, the earliest occurrence of a node handle is preserved, every other
occurrence is removed.
- select
- Replaces the current node set with a node set containing only the first
node from the current node set
- transform
query var body
- First it interprets the sub-query query, using the current node set
as its initial node set. Then it iterates over the result of that query,
binding the handle of each node to the variable named in var, and
executing the script body. The collected results of these
executions is made the new node set, replacing the current one.
The script body is executed in the context of the
caller.
- map var
body
- Iterates over the current node set, binding the handle of each node to the
variable named in var, and executing the script body. The
collected results of these executions is made the new node set, replacing
the current one.
The script body is executed in the context of the
caller.
- quote
val
- Appends the literal value val to the current node set.
- replace
val
- Replaces the current node set with the literal list value val.
- foreach
query var body
- Interprets the sub-query query, then performs the equivalent of
operator over on the nodes in the node set created by that query.
The current node set is not changed, except through side effects from the
script body.
The script body is executed in the context of the
caller.
- with query
body
- Interprets the query, then runs the script body on the node
set generated by the query. At last it restores the current node set as it
was before the execution of the query.
The script body is executed in the context of the
caller.
- over var
body
- Executes the script body for each node in the node set, with the
variable named by var bound to the name of the current node. The
script body is executed in the context of the caller.
This is like the builtin foreach, with the node set as
the source of the list to iterate over.
The results of executing the body are ignored.
- delete
- Deletes all the nodes contained in the current node set from the
tree.
These filters and accessors assume the existence of an attribute
called @type, and are short-hand forms useful for cost-like tree
query, html tree editing, and so on.
- nodetype
- Returns the node type of nodes. Attribute string accessor. This is
equivalent to
get @type
- oftype
t
- Reduces the node set to nodes whose type is equal to t, with letter
case ignored.
- nottype
t
- Reduces the node set to nodes whose type is not equal to t, with
letter case ignored.
- oftypes
attrs
- Reduces set to nodes whose @type is an element in the list attrs of
types. The value of @type is used as a glob pattern, and letter case is
relevant.
- [1]
- COST [http://wiki.tcl.tk/COST] on the Tcler's Wiki.
- [2]
- TreeQL [http://wiki.tcl.tk/treeql] on the Tcler's Wiki. Discuss
this package there.
This document, and the package it describes, will undoubtedly
contain bugs and other problems. Please report such in the category
treeql of the Tcllib SF Trackers
[http://sourceforge.net/tracker/?group_id=12883]. Please also report any
ideas for enhancements you may have for either package and/or
documentation.
Cost, DOM, TreeQL, XPath, XSLT, structured queries, tree, tree
query language
Copyright (c) 2004 Colin McCormack <coldstore@users.sourceforge.net>
Copyright (c) 2004 Andreas Kupries <andreas_kupries@users.sourceforge.net>