pt::peg::from::peg(n) | Parser Tools | pt::peg::from::peg(n) |
pt::peg::from::peg - PEG Conversion. Read PEG format
package require Tcl 8.5
package require pt::peg::from::peg ?1?
pt::peg::from::peg convert text
Are you lost ? Do you have trouble understanding this document ? In that case please read the overview provided by the Introduction to Parser Tools. This document is the entrypoint to the whole system the current package is a part of.
This package implements the converter from PEG markup to parsing expression grammars.
It resides in the Import section of the Core Layer of Parser Tools, and can be used either directly with the other packages of this layer, or indirectly through the import manager provided by pt::peg::import. The latter is intented for use in untrusted environments and done through the corresponding import plugin pt::peg::import::peg sitting between converter and import manager.
IMAGE: arch_core_iplugins
The API provided by this package satisfies the specification of the Converter API found in the Parser Tools Import API specification.
peg, a language for the specification of parsing expression grammars is meant to be human readable, and writable as well, yet strict enough to allow its processing by machine. Like any computer language. It was defined to make writing the specification of a grammar easy, something the other formats found in the Parser Tools do not lend themselves too.
It is formally specified by the grammar shown below, written in itself. For a tutorial / introduction to the language please go and read the PEG Language Tutorial.
PEG pe-grammar-for-peg (Grammar) # -------------------------------------------------------------------- # Syntactical constructs Grammar <- WHITESPACE Header Definition* Final EOF ; Header <- PEG Identifier StartExpr ; Definition <- Attribute? Identifier IS Expression SEMICOLON ; Attribute <- (VOID / LEAF) COLON ; Expression <- Sequence (SLASH Sequence)* ; Sequence <- Prefix+ ; Prefix <- (AND / NOT)? Suffix ; Suffix <- Primary (QUESTION / STAR / PLUS)? ; Primary <- ALNUM / ALPHA / ASCII / CONTROL / DDIGIT / DIGIT / GRAPH / LOWER / PRINTABLE / PUNCT / SPACE / UPPER / WORDCHAR / XDIGIT / Identifier / OPEN Expression CLOSE / Literal / Class / DOT ; Literal <- APOSTROPH (!APOSTROPH Char)* APOSTROPH WHITESPACE / DAPOSTROPH (!DAPOSTROPH Char)* DAPOSTROPH WHITESPACE ; Class <- OPENB (!CLOSEB Range)* CLOSEB WHITESPACE ; Range <- Char TO Char / Char ; StartExpr <- OPEN Expression CLOSE ; void: Final <- END SEMICOLON WHITESPACE ; # -------------------------------------------------------------------- # Lexing constructs Identifier <- Ident WHITESPACE ; leaf: Ident <- ('_' / ':' / <alpha>) ('_' / ':' / <alnum>)* ; Char <- CharSpecial / CharOctalFull / CharOctalPart / CharUnicode / CharUnescaped ; leaf: CharSpecial <- "\\" [nrt'"\[\]\\] ; leaf: CharOctalFull <- "\\" [0-2][0-7][0-7] ; leaf: CharOctalPart <- "\\" [0-7][0-7]? ; leaf: CharUnicode <- "\\" 'u' HexDigit (HexDigit (HexDigit HexDigit?)?)? ; leaf: CharUnescaped <- !"\\" . ; void: HexDigit <- [0-9a-fA-F] ; void: TO <- '-' ; void: OPENB <- "[" ; void: CLOSEB <- "]" ; void: APOSTROPH <- "'" ; void: DAPOSTROPH <- '"' ; void: PEG <- "PEG" WHITESPACE ; void: IS <- "<-" WHITESPACE ; leaf: VOID <- "void" WHITESPACE ; # Implies that definition has no semantic value. leaf: LEAF <- "leaf" WHITESPACE ; # Implies that definition has no terminals. void: END <- "END" WHITESPACE ; void: SEMICOLON <- ";" WHITESPACE ; void: COLON <- ":" WHITESPACE ; void: SLASH <- "/" WHITESPACE ; leaf: AND <- "&" WHITESPACE ; leaf: NOT <- "!" WHITESPACE ; leaf: QUESTION <- "?" WHITESPACE ; leaf: STAR <- "*" WHITESPACE ; leaf: PLUS <- "+" WHITESPACE ; void: OPEN <- "(" WHITESPACE ; void: CLOSE <- ")" WHITESPACE ; leaf: DOT <- "." WHITESPACE ; leaf: ALNUM <- "<alnum>" WHITESPACE ; leaf: ALPHA <- "<alpha>" WHITESPACE ; leaf: ASCII <- "<ascii>" WHITESPACE ; leaf: CONTROL <- "<control>" WHITESPACE ; leaf: DDIGIT <- "<ddigit>" WHITESPACE ; leaf: DIGIT <- "<digit>" WHITESPACE ; leaf: GRAPH <- "<graph>" WHITESPACE ; leaf: LOWER <- "<lower>" WHITESPACE ; leaf: PRINTABLE <- "<print>" WHITESPACE ; leaf: PUNCT <- "<punct>" WHITESPACE ; leaf: SPACE <- "<space>" WHITESPACE ; leaf: UPPER <- "<upper>" WHITESPACE ; leaf: WORDCHAR <- "<wordchar>" WHITESPACE ; leaf: XDIGIT <- "<xdigit>" WHITESPACE ; void: WHITESPACE <- (" " / "\t" / EOL / COMMENT)* ; void: COMMENT <- '#' (!EOL .)* EOL ; void: EOL <- "\n\r" / "\n" / "\r" ; void: EOF <- !. ; # -------------------------------------------------------------------- END;
Our example specifies the grammar for a basic 4-operation calculator.
PEG calculator (Expression) Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ; Sign <- '-' / '+' ; Number <- Sign? Digit+ ; Expression <- '(' Expression ')' / (Factor (MulOp Factor)*) ; MulOp <- '*' / '/' ; Factor <- Term (AddOp Term)* ; AddOp <- '+'/'-' ; Term <- Number ; END;
Using higher-level features of the notation, i.e. the character classes (predefined and custom), this example can be rewritten as
PEG calculator (Expression) Sign <- [-+] ; Number <- Sign? <ddigit>+ ; Expression <- '(' Expression ')' / (Factor (MulOp Factor)*) ; MulOp <- [*/] ; Factor <- Term (AddOp Term)* ; AddOp <- [-+] ; Term <- Number ; END;
Here we specify the format used by the Parser Tools to serialize Parsing Expression Grammars as immutable values for transport, comparison, etc.
We distinguish between regular and canonical serializations. While a PEG may have more than one regular serialization only exactly one of them will be canonical.
Assuming the following PEG for simple mathematical expressions
PEG calculator (Expression) Digit <- '0'/'1'/'2'/'3'/'4'/'5'/'6'/'7'/'8'/'9' ; Sign <- '-' / '+' ; Number <- Sign? Digit+ ; Expression <- '(' Expression ')' / (Factor (MulOp Factor)*) ; MulOp <- '*' / '/' ; Factor <- Term (AddOp Term)* ; AddOp <- '+'/'-' ; Term <- Number ; END;
then its canonical serialization (except for whitespace) is
pt::grammar::peg { rules { AddOp {is {/ {t -} {t +}} mode value} Digit {is {/ {t 0} {t 1} {t 2} {t 3} {t 4} {t 5} {t 6} {t 7} {t 8} {t 9}} mode value} Expression {is {/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}} mode value} Factor {is {x {n Term} {* {x {n AddOp} {n Term}}}} mode value} MulOp {is {/ {t *} {t /}} mode value} Number {is {x {? {n Sign}} {+ {n Digit}}} mode value} Sign {is {/ {t -} {t +}} mode value} Term {is {n Number} mode value} } start {n Expression} }
Here we specify the format used by the Parser Tools to serialize Parsing Expressions as immutable values for transport, comparison, etc.
We distinguish between regular and canonical serializations. While a parsing expression may have more than one regular serialization only exactly one of them will be canonical.
Assuming the parsing expression shown on the right-hand side of the rule
Expression <- '(' Expression ')' / Factor (MulOp Factor)*
then its canonical serialization (except for whitespace) is
{/ {x {t (} {n Expression} {t )}} {x {n Factor} {* {x {n MulOp} {n Factor}}}}}
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category pt of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation.
EBNF, LL(k), PEG, TDPL, context-free languages, conversion, expression, format conversion, grammar, matching, parser, parsing expression, parsing expression grammar, push down automaton, recursive descent, serialization, state, top-down parsing languages, transducer
Parsing and Grammars
Copyright (c) 2009 Andreas Kupries <andreas_kupries@users.sourceforge.net>
1 | pt |