-
Notifications
You must be signed in to change notification settings - Fork 67
Extended PEG Syntax
As we saw in PEG Basics and Declaring a Grammar, Pegged implements the entire PEG syntax, exactly as it was defined by its author.
Now, I felt the need to extend this a little bit. At that time, semantic actions were not implemented in Pegged so now that they are, these extensions are not strictly necessary, but they are useful shortcuts.
The first extensions act on the result of parsing expression. Given 'e' a parsing expression:
-
:e
will drope
's captures. And, due to the way sequences are implemented in Pegged, the mother expression will forgete
result (that's deliberate). It allows one to write:
mixin(grammar("
JSON <- :'{' (Pair (:',' Pair)*)? :'}'
Pair <- String :':' Value
# Rest of JSON grammar ...
"));
On the first rule, see the colon before the curly braces literals and the comma. That means that when called on {"Hello":42, "World!":0}
, JSON
parse tree will contain only the interesting parts, not the syntactic signs necessary to structure the JSON grammar:
ParseTree("JSON",
ParseTree("Pair",
ParseTree("String", ...)
ParseTree("Number", ...)),
ParseTree("Pair",
ParseTree("String", ...)
ParseTree("Number", ...))
)
The ~
(tilde) operator concatenate an expression captures in one string. I chose it for its proximity with the equivalent D operator. It's useful when an expression would otherwise return a long list of individual parses, whereas you're interested only in the global result:
mixin(grammar("
# See the ':' before DoubleQuote
# And the '~' before (Char*)
String <- :DoubleQuote ~(Char*) :DoubleQuote
"));
Without the tilde operator, using String
on a string would return a list of Char
results. With tilde, you get the string content, which is most probably what you want:
auto p = String.parse(q{"Hello World!"});
assert(p.capture == ["Hello World!"];
// without tilde: p.capture == ["H", "e", "l", "l", "o", " ", "W", ...]
The same goes for number-recognizers:
Number <- ~(Digit+)
Digit <- [0-9]
auto n = Number.parse("1234");
assert(n.capture == ["1234"]);
// without tilde: n.capture == ["1", "2", "3", "4"]
Internally, it's used by Identifier
and QualifiedIdentifier
.
The =name
(equal) operator is used to name a particular capture. it's defined in Named Captures. But here is the idea:
Email <- QualifiedIdentifier=name :'@' QualifiedIdentifier=domain
enum p = Email.parse("[email protected]");
assert(p.namedCaptures["name"] == "John.Doe");
assert(p.namedCaptures["domain"] == "example.org");
Semantic actions are enclosed in curly braces and put behind the expression they act upon:
XMLNode <- OpeningTag {OpAction} (Text / Node)* ClosingTag {CloseAction}
You can use any delegate from Output
to Output
as a semantic action. See Semantic Actions.
Pegged has other extensions, such as @
or ^
but these are in flux right now and I'll wait for the design to stabilize before documenting them.
All the previously-described extensions act upon expressions. When you want an operator to act upon an entire rule, it's possible to enclose it between parenthesis:
Rule <- ~(complicated expression I want to fuse)
This need is common enough for Pegged to provide a shortcut: put the operator in the arrow:
<~
(squiggly arrow) concatenates the captures on the right-hand side of the arrow.
<:
(colon arrow) drops the entire rule result (useful to ignore comments, for example)
<{Action}
associates an action with a rule.
For example:
Number <~ Digit+
Digit <- [0-9]
Comment <: "/*" (Comment / Text)* "*/"
Text <~ (!("/*"/"/*") .)*
That makes Number
expression a bit more readable (if you use a font that rightly distinguishes between ~ and -, as GitHub does not really do...)
Next lesson: Parametrized Rules