At the end of part 1 I had a parser that will parse a PEG, but isn't that useful because its output is simply a recursive data structure of the matched elements.
The next step is to add the ability to process each matching production rule. I do this by associating a function with each rule, the function takes two arguments - the matching string and the parsed elements - and returns a parse result. The parse result type is extended to allow passing an arbitrary type as a parse result...
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
type ParseResult<'a> = | |
| Parsed of 'a | |
| TerminalSymbol of string | |
| Production of ParseResult<'a> list | |
| EmptyMatch | |
| Unmatched |
This adds the
Parsed of 'a
type to the discriminated union.
Because each grammar element now requires three pieces of information (name, expression, function) I define a record type to make handling this easier. The function declaration of
parseExpression
changes as expected.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
type GrammarRule<'a>(prod:Expression, func:(string -> ParseResult<'a> -> ParseResult<'a>)) = | |
member this.Prod = prod | |
member this.Func = func | |
let parseExpression (grammar:Map<string,GrammarRule<'a>>) start (input:string) = |
The changes needed to parseExpression are small...
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| NonTerminal x -> | |
let rule = grammar.[x] | |
match parse offset rule.Prod with | |
| (Unmatched, _) as y -> y | |
| (parsed, endOffset) -> (rule.Func input.[offset..endOffset - 1] parsed, endOffset) | |
// ... | |
parse 0 <| NonTerminal start |
The original pattern match for
NonTerminal
simply recursively called parse
with the definition for the NonTerminal
name, now the same recursive call is made but, if the result is a successful match, the function associated with the match is called and the result of that function is passed as the result of the match.Grammar example - 1
Taking the same grammar I used in part 1, I now need to add a function to each rule.
In this example, I am evaluating the expression as it is being parsed, this is the simplest approach, later on I will show a two stage approach where the expression is converted to an AST that can be evaluated later.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// bool <- "true" / "false" | |
let boolRule = GrammarRule<bool>(Choice [Terminal "true"; Terminal "false"], (fun s _ -> Parsed (s = "true"))) | |
// paren <- "(" expr ")" | |
let parseParen _ = function | |
| Production [TerminalSymbol "("; x; TerminalSymbol ")"] -> x | |
| x -> unexpected x | |
let parenRule = GrammarRule<bool>(Sequence [Terminal "("; NonTerminal "expr"; Terminal ")"], parseParen) | |
// not <- "!" atom | |
let parseNot _ = function | |
| Production [TerminalSymbol "!"; Parsed x] -> Parsed <| not x | |
| x -> unexpected x | |
let notRule = GrammarRule<bool>(Sequence [Terminal "!"; NonTerminal "atom"], parseNot) | |
// atom <- bool / paren / not | |
let atomRule = GrammarRule<bool>(Choice [NonTerminal "bool"; NonTerminal "paren"; NonTerminal "not"], (fun _ x -> x)) | |
// and <- atom ("&" and)? | |
// or <- and ("|" or)? | |
let parseBin _ = function | |
| Production [x; EmptyMatch] -> x | |
| Production [Parsed x; Production [TerminalSymbol "&"; Parsed y]] -> Parsed (x && y) | |
| Production [Parsed x; Production [TerminalSymbol "|"; Parsed y]] -> Parsed (x || y) | |
| x -> unexpected x | |
let andRule = GrammarRule<bool>(Sequence [NonTerminal "atom"; Optional (Sequence [Terminal "&"; NonTerminal "and"])], parseBin) | |
let orRule = GrammarRule<bool>(Sequence [NonTerminal "and"; Optional (Sequence [Terminal "|"; NonTerminal "or"])], parseBin) | |
// expr <- or | |
let exprRule = GrammarRule<bool>(NonTerminal "or", (fun _ x -> x)) | |
// start <- expr <epsilon> | |
let parseStart _ = function | |
| Production [x; EmptyMatch] -> x | |
| x -> unexpected x | |
let startRule = GrammarRule<bool>(Sequence [NonTerminal "expr"; Epsilon], parseStart) |
Hopefully each of these is self explanatory, the same function can be used to parse both
&
and |
because they have equivalent structure; the first pattern in parseBin
is taken when the Optional
second element of the Sequence
is not matched.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
let g = Map.ofList [("bool", boolRule); ("paren", parenRule); ("not", notRule); ("atom", atomRule); | |
("and", andRule); ("or", orRule); ("expr", exprRule); ("start", startRule)] |
This code can now evaluate simple boolean expressions...
> parseExpression g "start" "true&false";;
val it : ParseResult<bool> * int = (Parsed false, 10)
> parseExpression g "start" "true|true&false";;
val it : ParseResult<bool> * int = (Parsed true, 15)
> parseExpression g "start" "(true|true)&false";;
val it : ParseResult<bool> * int = (Parsed false, 17)
Grammar example - 2
In this example the same grammar (with some minor changes, such as allowing whitespace) demonstrates parsing into an AST (the
BooleanExpr
type) which is then evaluated in a second stage.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
type BooleanExpr = | |
| Atom of bool | |
| Binary of BooleanExpr * string * BooleanExpr | |
| Not of BooleanExpr | |
let rec evalBool = function | |
| Atom x -> x | |
| Binary (left, "&", right) -> if evalBool left then evalBool right else false | |
| Binary (left, "|", right) -> if evalBool left then true else evalBool right | |
| Binary (_, x, _) -> failwith <| "Unexpected binary operator: " + x | |
| Not x -> not <| evalBool x | |
let boolGrammar = | |
// bool <- "true" / "false" | |
let boolRule = GrammarRule<BooleanExpr>(Choice [Terminal "true"; Terminal "false"], (fun s _ -> Parsed (Atom (s = "true")))) | |
// space <- " "* | |
let spaceRule = GrammarRule<BooleanExpr>(ZeroOrMore (Terminal " "), (fun _ _ -> EmptyMatch)) | |
// paren <- "(" space expr space ")" | |
let boolParen _ = function | |
| Production [TerminalSymbol "("; EmptyMatch; x; EmptyMatch; TerminalSymbol ")"] -> x | |
| x -> unexpected x | |
let parenRule = GrammarRule<BooleanExpr>(Sequence [Terminal "("; NonTerminal "space"; NonTerminal "expr"; NonTerminal "space"; Terminal ")"], boolParen) | |
// not <- "!" atom | |
let boolNot _ = function | |
| Production [TerminalSymbol "!"; Parsed x] -> Parsed (Not x) | |
| x -> unexpected x | |
let notRule = GrammarRule<BooleanExpr>(Sequence [Terminal "!"; NonTerminal "atom"], boolNot) | |
// atom <- "bool" / "paren" / "not" | |
let atomRule = GrammarRule<BooleanExpr>(Choice [NonTerminal "bool"; NonTerminal "paren"; NonTerminal "not"], (fun _ x -> x)) | |
// or <- atom space ("|" space or)? | |
// and <- or space ("&" space and)? | |
let boolBinary _ = function | |
| Production [x; EmptyMatch; EmptyMatch] -> x | |
| Production [Parsed x; EmptyMatch; Production [TerminalSymbol op; EmptyMatch; Parsed y]] -> Parsed (Binary (x, op, y)) | |
| x -> unexpected x | |
let orRule = GrammarRule<BooleanExpr>(Sequence [NonTerminal "atom"; NonTerminal "space"; Optional (Sequence [Terminal "|"; NonTerminal "space"; NonTerminal "or"])], boolBinary) | |
let andRule = GrammarRule<BooleanExpr>(Sequence [NonTerminal "or"; NonTerminal "space"; Optional (Sequence [Terminal "&"; NonTerminal "space"; NonTerminal "and"])], boolBinary) | |
// expr <- and | |
let exprRule = GrammarRule<BooleanExpr>(NonTerminal "and", (fun _ x -> x)) | |
// start <- space expr space <epsilon> | |
let boolStart _ = function | |
| Production [EmptyMatch; x; EmptyMatch; EmptyMatch] -> x | |
| x -> unexpected x | |
let startRule = GrammarRule<BooleanExpr>(Sequence [NonTerminal "space"; NonTerminal "expr"; NonTerminal "space"; Epsilon], boolStart) | |
Map.ofList [("bool", boolRule); ("space", spaceRule); ("paren", parenRule); ("not", notRule); | |
("atom", atomRule); ("or", orRule); ("and", andRule); ("expr", exprRule); ("start", startRule)] | |
let parseAndEvalBool expr = | |
match parseExpression boolGrammar "start" expr with | |
| (Parsed b, _) -> printfn "%s = %A" expr (evalBool b) | |
| x -> unexpected x |
The code for this artice is on github.
In part 3 I make things a little more practical, expressing the grammar as literal objects is very clumsy - I build a parser that parses textual PEG expressions...
It’s arduous to search out educated people on this subject, but you sound like you understand what you’re talking about! Thanks online casino slots
ReplyDelete