grammar::me_vm(n) | Grammar operations and usage | grammar::me_vm(n) |
grammar::me_vm - Virtual machine for parsing token streams
Please go and read the document grammar::me_intro first for an overview of the various documents and their relations.
This document specifies a virtual machine for the controlled matching and parsing of token streams, creating an abstract syntax tree (short AST) reflecting the structure of the input. Special machine features are the caching and reuse of partial results, caching of the encountered input, and the ability to backtrack in both input and AST creation.
These features make the specified virtual machine especially useful to packrat parsers based on parsing expression grammars. It is however not restricted to this type of parser. Normal LL and LR parsers can be implemented with it as well.
The following sections will discuss first the abstract state kept by ME virtual machines, and then their instruction set.
A ME virtual machine manages the following state:
This information is used and modified by the instructions defined in the section TERMINAL MATCHING.
This information is implicitly used and modified by the instructions defined in the sections TERMINAL MATCHING and NONTERMINAL MATCHING, and can be directly queried and modified by the instructions defined in section INPUT LOCATION HANDLING.
This information is implicitly used and modified by the instructions defined in the sections TERMINAL MATCHING and NONTERMINAL MATCHING, and can be directly queried and modified by the instructions defined in section INPUT LOCATION HANDLING.
This information is influenced by the instructions defined in the sections TERMINAL MATCHING, NONTERMINAL MATCHING, and UNCONDITIONAL MATCHING. It is queried by the instructions defined in the section CONTROL FLOW.
This information is influenced by the instructions defined in the sections SEMANTIC VALUES, and AST STACK HANDLING.
This information is influenced by the instructions defined in the sections SEMANTIC VALUES, and AST STACK HANDLING.
This information is influenced by the instructions defined in the sections SEMANTIC VALUES, and AST STACK HANDLING.
Note that error information can be set even if the last attempt at matching input was successful. For example the *-operator (matching a sub-expression zero or more times) in a parsing expression grammar is always successful, even if it encounters a problem further in the input and has to backtrack. Such problems must not be forgotten when continuing to match.
This information is queried and influenced by the instructions defined in the sections TERMINAL MATCHING, NONTERMINAL MATCHING, and ERROR HANDLING.
This information is queried and influenced by the instructions defined in the sections TERMINAL MATCHING, NONTERMINAL MATCHING, and ERROR HANDLING.
This information is queried and influenced by the instructions defined in the section NONTERMINAL MATCHING.
The key location is where machine started the attempt to match the named nonterminal symbol, and the location in the value is where machine ended up after the attempt completed, independent of the success of the attempt.
This status is queried and influenced by the instructions defined in the section NONTERMINAL MATCHING.
With the machine state specified it is now possible to explain the instruction set of ME virtual machines. They are grouped roughly by the machine state they influence and/or query.
First the instructions to match tokens from the input stream, and by extension all terminal symbols.
These instructions are the only ones which may retrieve a new token from the input stream. This is a may and not a will because the instructions will a retrieve new token if, and only if the current location CL is at the head of the stream. If the machine has backtracked (see icl_rewind) the instructions will retrieve the token to compare against from the internal cache.
The sucess/failure of the instruction is remembered in the match status OK. In the case of failure the error status ER is set to the current location and the message message. In the case of success the error status ER is cleared, the new token is made the current token CT, and the new location is made the current location CL.
The argument message is a reference to the string to put into the error status ER, if such is needed.
In case of failure the error status ER is set to the current location CL and the message message, and the current location CL is moved one token backwards. Otherwise, i.e. upon success, the error status ER is cleared and the current location CL is not touched.
In case of failure the error status ER is set to the current location CL and the message message, and the current location CL is moved one token backwards. Otherwise, i.e. upon success, the error status ER is cleared and the current location CL is not touched.
In case of failure the error status ER is set to the current location CL and the message message, and the current location CL is moved one token backwards. Otherwise, i.e. upon success, the error status ER is cleared and the current location CL is not touched.
Currently the following classes are legal:
The instructions in this section handle the matching of nonterminal symbols. They query the nonterminal cache NC for saved information, and put such information into the cache.
The usage of the cache is a performance aid for backtracking parsers, allowing them to avoid an expensive rematch of complex nonterminal symbols if they have been encountered before.
If no information was found the instruction will continue execution at the next instruction.
Together with icf_ntcall it is possible to generate code for memoized and non-memoized matching of nonterminal symbols, either as subroutine calls, or inlined in the caller.
It is expected to be called at the end of matching a nonterminal symbol, with nt the name of the nonterminal symbol the code was working on. This allows the instruction inc_restore to check for and retrieve the data, should we have to match this nonterminal symbol at the same location again, during backtracking.
The next matching icf_ntreturn will cause the execution to continue at the instruction coming after the call.
The instructions in this section are the remaining match operators. They change the match status OK directly and unconditionally.
The instructions in this section implement both conditional and unconditional control flow. The conditional jumps query the match status OK.
The instructions in this section are for backtracking, they manipulate the current location CL of the machine state. They allow a user of the machine to query and save locations in the input, and to rewind the current location CL to saved locations, making them one of the components enabling the implementation of backtracking parsers.
The instructions in this section provide read and write access to the error status ER of the machine.
The merge is performed as described below:
If one of the two error states is empty the other is chosen. If neither error state is empty, and refering to different locations, then the error state with the location further in the input is chosen. If both error states refer to the same location their messages are merged (with removing duplicates).
The instructions in this section manipulate the semantic value SV.
This instruction should be executed if, and only if the match status OK indicates a success. In the case of a failure isv_clear should be called.
This instruction should be executed if, and only if the match status OK indicates a success. In the case of a failure isv_clear should be called.
All entries on the AST stack AS above the marker found in the top entry of the AST Marker stack MS become children of the new node, with the entry at the stack top becoming the rightmost child. If the AST Marker stack MS is empty the whole stack is used. The AST marker stack MS is left unchanged.
This instruction should be executed if, and only if the match status OK indicates a success. In the case of a failure isv_clear should be called.
The instructions in this section manipulate the AST stack AS, and the AST Marker stack MS.
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category grammar_me of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation.
grammar, parsing, virtual machine
Grammars and finite automata
Copyright (c) 2005 Andreas Kupries <andreas_kupries@users.sourceforge.net>
0.1 | grammar_me |