Saturday, February 23, 2013

Joined-up Engineering

There are many well known differences between American and British English, but it can be the less well known that make me stop and think.
Where Americans will refer to 'print(ed)' and 'cursive' handwriting, the British are much more likely to use the word 'joined-up' instead of 'cursive' 1
In the late 90's the phrase Joined-up Government was used by the Blair administration to promote the idea that government departments should work together. How well this can work for government departments is a matter of debate, but I feel that the idea has serious applicability to software engineering.

The Analogy

If we consider the different tasks, or types of task, in a software project to be the letters, are we 'printing' and performing the tasks in isolation, or performing 'joined-up engineering' and allowing the tasks to naturally flow into and influence their neighbors?
Non-trivial projects have more than one engineer, different tasks and roles are filled by different people with different abilities and specializations — large projects have more than one team.
To me, joined-up engineering emphasizes the importance of that everyone is working on the same project, that there is a line of work or responsibility that flows cleanly through each task/engineer/role/team, rather than the hard boundaries that produce problems like: fiefdoms; impedance mismatch; feature and code ownership arguments; &c.

Stretching the Analogy

  • While the general principal of joined-up writing is that the pen never leaves the page, there are well known exceptions, we "dot the i's and cross the t's" (hopefully not forgetting j's (or accents)), x is usually written by lifting the pen &emdash; knowing when not to apply a principal is important.
  • Speaking for myself, I can probably print faster than I can write joined-up, but my joined up writing is (or has the potential to be) far more attractive, and with practice probably faster.
  • Bureaucracies prefer print — as anyone who has filled in official paperwork knows — in the most exaggerated cases each letter must be within its own box, this is purely for the benefit of the bureaucracy.

1 The British politician Jonathan Aitken, while serving a prison sentence for perjury, was given the nickname "Joino" by fellow inmates for his ability(!) to use joined-up handwriting.

Saturday, February 2, 2013

Tiny Basic using PEG and F#

Prompted by +Mike Begley I just used the peg parser I showed in previous blog posts to make a simple implementation of Tiny Basic
This allows simple Tiny Basic code to run, !load and !save do the obvious things

Friday, February 1, 2013

Parsing Expression Grammar - part 3

The PEG parser at the end of part 2 will parse, and call out to code when a rule is parsed, but isn't particularly easy to use, most particularly because the grammar has to be expressed as F# objects. In this post I'll extend the parser to be able to parse PEG from a text file, and to generate F# code for a parser described by that PEG.
There are only two (big) things missing from the code needed to do this, a parser for PEG - which I will write as (you guessed it) a PEG, and the code generator for the parser.

Parsing PEG

Parsing the PEG requires a complete grammar.

Parser changes

I have added a few new terminal types to the Expression type defined in part 1, these allow a grammar to specify a terminal symbol match on one of a selection of characters, to match a character (strictly a UTF-16 character) based on a Unicode category, and to match any character, These make specifying real world grammars (including that for PEG itself) very much easier.
Because I have added terminal types that operate on a per character basis, I've added a simple utility method that gets the next single character from the input, if any. This is this only place in the code where surrogate pairs are specifically handled.
I have extended parseExpression in the obvious way to parse the new terminal expressions.

Parsing PEG

I now have enough pieces to be able to create a PEG parser for PEG, this is in the file pegOfPeg.fs, an example of one of the grammar rules is...
The rest of the files is much the same, with a parse method for each rule in the grammar and a corresponding F# declaration.

Code Generation

The main method for code generation is a recursive function which matches an Expression argument. For the simplest expression types, those that don't require recursively parsing one or more expressions, the code generation consists of writing code that calls out to the function that performs the parsing for that terminal type (for the terminals), matching the end of the input (for the <epsilon> rule), and calls a rule matching function (whose generation I will show below) - for NonTerminal rules. The functions ibprintf and ibprintfn are used to print an indented string to a StringBuilder.
To match Sequence and Choice expressions, the code generator generates a list of lambdas, each of which is recursively generated by the codeGen function, one for each sub-expression of the Sequence or Choice. This list of lambdas is then past to the utility function matchSequence or matchChoice as appropriate.
For the expressions which wrap a single expression (Optional, ZeroOrMore, OneOrMore, And, and Not) a similar approach generates a lambda for the inner expression and passes it to the appropriate utility function.
When matching a top-level grammar rule the code generator creates a function; there are two possible code paths, for rules where there is no code defined to be executed when the rule is matched (in which case the function simply recursively generates the code for the inner expression), and for rules where there is code to be executed on the successful match of the rule, where matching on the result of the inner expression has to be called to ensure that the code is not executed inappropriately. I make no effort to ensure that the code specified in the grammar is valid.
The code generator for the Rule expression doesn't generate a valid function, there is no let rec prefixed; this is because the rule matching methods have to be mutually recursive, so an outer function is used to call codeGen for rules - it is that function which is told whether this is the first rule or not.
The rest of the code generator is found in peg.fs and consists of bolier plate implementations of the functions used by the generated code and utility functions that make the rest of the code generation easier.

Left recursion

A PEG parser cannot work with a grammar that contains left recursion. Left recursion occurs when a non-terminal symbol in the left position in an expression recursively (either directly or indirectly) refers to the same rule. The left position is the first element in a Sequence (this is a simplification!), any element in a Choice, and for the expressions that have a single sub-expression, that sub-expression is always in the left-position. It can be difficult to determine which elements in a sequence are in the left position because a Non-Terminal that successfully matches with zero-width (for example an Optional or ZeroOrMore expression)
a <- a b     The simplest form of left recursion

a <- b c
b <- a / c   Indirect left recursion

a <- b c
b <- d a     Left recursive because d can match without consuming input
d <- e?      even though a is not in the obvious left position
The following code detects simple forms of left recursion, but doesn't find the third example above.

Putting it all together

The code in Program.fs puts this together into a program that takes an input file and writes an output, taking a grammar description, prefix and suffix from the input file. The format of the input file is as follows:

prefix written verbatim to the output file
(*%%
grammar rules
%%*)
suffix written verbatim to the output file
The section separators in the input file ((*%% and %%*)) are chosen to be similar to yacc and to allow the input file to be read as valid F#, which makes writing the prefix and suffix easier! The example on github implements a basic calculator program.
If the -x argument is passed to the parser generator, it will attempt to compile and execute the output file, this is done using the FSharpCodeProvider class from the F# power pack, Vladimir Ivanovskiy has a great article about Embedded Scripting using F#.
As usual, all the code for this is on github.