Wait, What?: 2013

Wednesday, October 30, 2013

Of the Just Shaping of Letters

I stumbled across a print copy of Dürer's Of the Just Shaping of Letters in a second-hand bookshop years ago. This slim book — an English translation of part of book three of Dürer's Applied Geometry from 15351 — contains descriptions of how to draw uppercase Latin letters2. Each letter is described as a series of geometric instructions and relationships within the containing square.

As far as I am aware, Dürer is not known as a typographer per se, these letter forms are presented as a description of a how letters may be formed, rather than as a description of a specific designed typeface, they seem to be intended for an audience designing monumental stonework rather than cutting typefaces.

My interest is particularly piqued because the descriptions are close to being algorithms for each letter, the geometric instructions amount to a constraints system.

Earlier this summer I transcribed each letter into JavaScript, and over the last few days ported that to CoffeeScript (mostly just because, and what better thing to do on a Hawai’ian beach in between stand up paddle boarding and drinking daiquiris). The original lacks descriptions for J, U, and W because they weren’t/aren’t Latin letters - but I added versions. Several of the letters have variants, I only chose one in most cases.

As an example, here is the original description of the letter ‘V’ — chosen to be a short description, but not too short!

V you shall thus make in its square: Bisect c. d. in the point e.; then set the point f. one-tenth of the whole line a. b. beyond a., and in like fashion g. to the hither side of b. Then draw the broad limb of your letter downwards from f. to e. and sharpen it; & thence draw upwards your slender limb to g.; and at the top produce it in either direction, as you did before at the bottom of A; just as you see it shown below.

The CoffeeScript is not quite as pithy, but is also doing a little more housekeeping...

  drawV: (id, variant, proportions) ->
    outline = new OutlineDrawer sys.ctx

    outline.drawLine @labels.a, @labels.b
    outline.drawLine @labels.c, @labels.d
    outline.drawLine @labels.a, @labels.c
    outline.drawLine @labels.b, @labels.d

    @labels.e = midPoint @labels.c, @labels.d
    @labels.f = @labels.a.towards @labels.b, @serif
    @labels.g = @labels.b.towards @labels.a, @serif

    ll_tr = @labels.f.towards @labels.b, @wide
    ll_trr = ll_tr.towards @labels.b, @serif
    ll_br = @labels.e.towards @labels.d, @wide

    outline.drawLine @labels.f, @labels.e
    outline.drawLine ll_tr, ll_br
    llls = outline.drawTouchingCircle @labels.e, @labels.f, @labels.a
    llrs = outline.drawTouchingCircle ll_trr, ll_tr, ll_br

    rl_tl = @labels.g.towards @labels.a, @narrow
    rl_tll = rl_tl.towards @labels.a, @serif
    rl_bl = @labels.e.towards @labels.c, @narrow

    outline.drawLine rl_tl, rl_bl
    outline.drawLine @labels.g, @labels.e
    rlls = outline.drawTouchingCircle rl_bl, rl_tl, rl_tll
    rlrs = outline.drawTouchingCircle @labels.b, @labels.g, @labels.e

    render = new FillShape sys.ctx

    render.moveTo @labels.e
    render.addArc llls, true
    render.addArc llrs, true
    render.lineTo (intersect ll_tr, ll_br, rl_tl, rl_bl)
    render.addArc rlls, true
    render.addArc rlrs, true

    render.closeAndFill()
    outline.labelPoints @labels
    return

Generated letter 'V'

wide, serif, and narrow are proportions of the square, OutlineDrawer and FillShape are classes that manage drawing the construction lines that are used for the constraints, and managing rendering the glyph respectively

As usual the code for this can be found on GitHub and a demo page can be found at github.io

Fortunately now available on Project Gutenburg
There is also discussion and examples of lower case letters, but not ones that would match the uppercase latin letters

Saturday, February 23, 2013

Joined-up Engineering

There are many well known differences between American and British English, but it can be the less well known that make me stop and think.
Where Americans will refer to 'print(ed)' and 'cursive' handwriting, the British are much more likely to use the word 'joined-up' instead of 'cursive' ¹
In the late 90's the phrase Joined-up Government was used by the Blair administration to promote the idea that government departments should work together. How well this can work for government departments is a matter of debate, but I feel that the idea has serious applicability to software engineering.

The Analogy

If we consider the different tasks, or types of task, in a software project to be the letters, are we 'printing' and performing the tasks in isolation, or performing 'joined-up engineering' and allowing the tasks to naturally flow into and influence their neighbors?
Non-trivial projects have more than one engineer, different tasks and roles are filled by different people with different abilities and specializations — large projects have more than one team.
To me, joined-up engineering emphasizes the importance of that everyone is working on the same project, that there is a line of work or responsibility that flows cleanly through each task/engineer/role/team, rather than the hard boundaries that produce problems like: fiefdoms; impedance mismatch; feature and code ownership arguments; &c.

Stretching the Analogy

While the general principal of joined-up writing is that the pen never leaves the page, there are well known exceptions, we "dot the i's and cross the t's" (hopefully not forgetting j's (or accents)), x is usually written by lifting the pen — knowing when not to apply a principal is important.
Speaking for myself, I can probably print faster than I can write joined-up, but my joined up writing is (or has the potential to be) far more attractive, and with practice probably faster.
Bureaucracies prefer print — as anyone who has filled in official paperwork knows — in the most exaggerated cases each letter must be within its own box, this is purely for the benefit of the bureaucracy.

1 The British politician Jonathan Aitken, while serving a prison sentence for perjury, was given the nickname "Joino" by fellow inmates for his ability(!) to use joined-up handwriting.

Saturday, February 2, 2013

Tiny Basic using PEG and F#

Prompted by +Mike Begley I just used the peg parser I showed in previous blog posts to make a simple implementation of Tiny Basic

This allows simple Tiny Basic code to run, !load and !save do the obvious things

Friday, February 1, 2013

Parsing Expression Grammar - part 3

The PEG parser at the end of part 2 will parse, and call out to code when a rule is parsed, but isn't particularly easy to use, most particularly because the grammar has to be expressed as F# objects. In this post I'll extend the parser to be able to parse PEG from a text file, and to generate F# code for a parser described by that PEG.

There are only two (big) things missing from the code needed to do this, a parser for PEG - which I will write as (you guessed it) a PEG, and the code generator for the parser.

Parsing PEG

Parsing the PEG requires a complete grammar.

Parser changes

I have added a few new terminal types to the Expression type defined in part 1, these allow a grammar to specify a terminal symbol match on one of a selection of characters, to match a character (strictly a UTF-16 character) based on a Unicode category, and to match any character, These make specifying real world grammars (including that for PEG itself) very much easier.

Because I have added terminal types that operate on a per character basis, I've added a simple utility method that gets the next single character from the input, if any. This is this only place in the code where surrogate pairs are specifically handled.

I have extended parseExpression in the obvious way to parse the new terminal expressions.

Parsing PEG

I now have enough pieces to be able to create a PEG parser for PEG, this is in the file pegOfPeg.fs, an example of one of the grammar rules is...

The rest of the files is much the same, with a parse method for each rule in the grammar and a corresponding F# declaration.

Code Generation

The main method for code generation is a recursive function which matches an Expression argument. For the simplest expression types, those that don't require recursively parsing one or more expressions, the code generation consists of writing code that calls out to the function that performs the parsing for that terminal type (for the terminals), matching the end of the input (for the <epsilon> rule), and calls a rule matching function (whose generation I will show below) - for NonTerminal rules. The functions ibprintf and ibprintfn are used to print an indented string to a StringBuilder.

To match Sequence and Choice expressions, the code generator generates a list of lambdas, each of which is recursively generated by the codeGen function, one for each sub-expression of the Sequence or Choice. This list of lambdas is then past to the utility function matchSequence or matchChoice as appropriate.

For the expressions which wrap a single expression (Optional, ZeroOrMore, OneOrMore, And, and Not) a similar approach generates a lambda for the inner expression and passes it to the appropriate utility function.

When matching a top-level grammar rule the code generator creates a function; there are two possible code paths, for rules where there is no code defined to be executed when the rule is matched (in which case the function simply recursively generates the code for the inner expression), and for rules where there is code to be executed on the successful match of the rule, where matching on the result of the inner expression has to be called to ensure that the code is not executed inappropriately. I make no effort to ensure that the code specified in the grammar is valid.

The code generator for the Rule expression doesn't generate a valid function, there is no let rec prefixed; this is because the rule matching methods have to be mutually recursive, so an outer function is used to call codeGen for rules - it is that function which is told whether this is the first rule or not.

The rest of the code generator is found in peg.fs and consists of bolier plate implementations of the functions used by the generated code and utility functions that make the rest of the code generation easier.

Left recursion

A PEG parser cannot work with a grammar that contains left recursion. Left recursion occurs when a non-terminal symbol in the left position in an expression recursively (either directly or indirectly) refers to the same rule. The left position is the first element in a Sequence (this is a simplification!), any element in a Choice, and for the expressions that have a single sub-expression, that sub-expression is always in the left-position. It can be difficult to determine which elements in a sequence are in the left position because a Non-Terminal that successfully matches with zero-width (for example an Optional or ZeroOrMore expression)

a <- a b     The simplest form of left recursion

a <- b c
b <- a / c   Indirect left recursion

a <- b c
b <- d a     Left recursive because d can match without consuming input
d <- e?      even though a is not in the obvious left position

The following code detects simple forms of left recursion, but doesn't find the third example above.

Putting it all together

The code in Program.fs puts this together into a program that takes an input file and writes an output, taking a grammar description, prefix and suffix from the input file. The format of the input file is as follows:


prefix written verbatim to the output file
(*%%
grammar rules
%%*)
suffix written verbatim to the output file

The section separators in the input file ((*%% and %%*)) are chosen to be similar to yacc and to allow the input file to be read as valid F#, which makes writing the prefix and suffix easier! The example on github implements a basic calculator program.

If the -x argument is passed to the parser generator, it will attempt to compile and execute the output file, this is done using the FSharpCodeProvider class from the F# power pack, Vladimir Ivanovskiy has a great article about Embedded Scripting using F#.

As usual, all the code for this is on github.

Sunday, January 20, 2013

Parsing Expression Grammar - part 2

At the end of part 1 I had a parser that will parse a PEG, but isn't that useful because its output is simply a recursive data structure of the matched elements.

The next step is to add the ability to process each matching production rule. I do this by associating a function with each rule, the function takes two arguments - the matching string and the parsed elements - and returns a parse result. The parse result type is extended to allow passing an arbitrary type as a parse result...

This adds the Parsed of 'a type to the discriminated union.

Because each grammar element now requires three pieces of information (name, expression, function) I define a record type to make handling this easier. The function declaration of parseExpression changes as expected.

The changes needed to parseExpression are small...

The original pattern match for NonTerminal simply recursively called parse with the definition for the NonTerminal name, now the same recursive call is made but, if the result is a successful match, the function associated with the match is called and the result of that function is passed as the result of the match.

Grammar example - 1

Taking the same grammar I used in part 1, I now need to add a function to each rule.

In this example, I am evaluating the expression as it is being parsed, this is the simplest approach, later on I will show a two stage approach where the expression is converted to an AST that can be evaluated later.

Hopefully each of these is self explanatory, the same function can be used to parse both & and | because they have equivalent structure; the first pattern in parseBin is taken when the Optional second element of the Sequence is not matched.

This code can now evaluate simple boolean expressions...


> parseExpression g "start" "true&false";;
val it : ParseResult<bool> * int = (Parsed false, 10)
> parseExpression g "start" "true|true&false";;
val it : ParseResult<bool> * int = (Parsed true, 15)
> parseExpression g "start" "(true|true)&false";;
val it : ParseResult<bool> * int = (Parsed false, 17)

Grammar example - 2

In this example the same grammar (with some minor changes, such as allowing whitespace) demonstrates parsing into an AST (the BooleanExpr type) which is then evaluated in a second stage.

The code for this artice is on github.

In part 3 I make things a little more practical, expressing the grammar as literal objects is very clumsy - I build a parser that parses textual PEG expressions...

Thursday, January 17, 2013

Parsing Expression Grammar - part 1

I'm not a great language and grammar theoretician, so I'm not going to go into great detail about what a Parsing Expression Grammar (PEG) is, but here are a few observations.

PEGs don't require a tokenizing stage - this can be a mixed blessing, but does mean that the grammar can be entirely expressed in one specification
Tools like yacc and bison create parsers from LALR grammars - this is a relatively complicated procedure and requires that the grammar either be very carefully constructed to contain no ambiguity, or to be annotated such that the parser always knows how to resolve that ambiguity
Grammars expressed as PEGs don't really have this problem (although you can't have left recursion - which I'll get to eventually)

I'm using F# for the code in this series of posts, I'm not a completely fluent F# programmer so there may be some clumsy code in places, but it is a very succinct and clean language for this type of work.

A very simple introduction to PEG

A very minimalist implementation of a PEG only requires a few elements, I'll add some more later, both for completeness and convenience; a PEG expression is comprised of:

Terminal symbol - which is commonly represented as text in quotation marks - "foo" would be a terminal symbol
A NonTerminal symbol - a word or identifier not in quotations mars - e.g. foo
Empty - matches the end of the input - represented with an epsilon
A Sequence - a list of expressions, each of which must be matched in order - represented as a space separated list
A Choice - a list of expressions, tested in order, the first match is chosen - represented as a list of expressions separated by /
Zero or More - an expression that can occur any number of times, including zero - represented as a * suffix
One or More - an expression that can occur any number of times, but must occur at least once - represented as a + suffix
Optional - an expression that may or may-not match - represented as a ? suffix
And - an expression that must match but that isn't captured by the matching process - represented as a & prefix
Not - an expression that must not match and isn't captured by the matching process - represented as a ! prefix

First Implementation

While grammars are usually stored as text files, I'm going to start with the in memory representation of the PEG expressions.

This is a straight-forward translation of the description of the elements into an F# discriminated union type, before I dive into how to process this, I'll show some code that will print a PEG expression in a human readable form.

There is nothing particularly complex about this, the Sequence and Choice patterns use the forward function combination operator >> to pipeline converting the list of expressions into a list of strings (map), and then to join the elements using the appropriate separator (reduce).

To check that this works as expected I'll try a few expressions in F# interactive.

> printPeg <| Terminal "foo";;
"foo"
val it : unit = ()
> printPeg <| NonTerminal "foo";;
foo
val it : unit = ()
> printPeg <| Sequence [NonTerminal "atom"; Optional (Sequence [Terminal "*"; NonTerminal "product"])];;
atom "*" product?
val it : unit = ()

This doesn't quite work how I want it to, in the last expression it isn't clear what is optional, so I need to make printPeg a touch more complex...

This now perhaps errs on the the cautious side, but adds parenthesis where they are needed...

> printPeg <| Sequence [NonTerminal "atom"; Optional (Sequence [Terminal "*"; NonTerminal "product"])];;
atom (("*" product)?)
val it : unit = ()

When text is parsed against a grammar, there needs to be a internal representation of the parsed data, in part 2 when I'll add functions to grammar rules, this representation is the input those functions.

TerminalSymbol - hopefully self explanatory
Production - represents a Sequence, ZeroOrMore, or OneOrMore
EmptyMatch - Used for non-capturing successful matches ZeroOrMore (with zero matches), Optional (not present), And, Not
Unmatched - No match found

Taking the parse function in parts.

First this the atomic expressions, these are all pretty straightforward.

Moving on the Sequence and Choice

The pattern for Sequence creates an F# seq over the list of sub-expressions for the sequence, when an expression doesn't match the sequence is terminated with an Unmatched parse result; that is used as a signal to mark that the Sequence as a whole didn't match; otherwise the results from the sub-expressions are packaged into a Production parse result.

The pattern for Choice enumerates over the sub-expressions until it finds the first one that matches; that result is returned.

The rest...

ZeroOrMore and OneOrMore are almost the same, DRY suggests that I should break out the common code for these patterns; both collect a sequence of contiguous matches for the sub-expression, they differ only in how they handle the empty collection.

Optional, And, and Not can all be pulled onto one line (depending on how wide your monitor is...) they simply filter the response from the recursive parsing.

parseExpression is then rounded out with a call to the inner function to start the recursive parsing in the context of the grammar and input string.

Trying it out

First here's a simple grammar for boolean arithmetic.

The printPeg function still works...


> Map.fold (fun _ k v -> printfn "%s -> %s" k <| pegToString v) () g;;
and -> atom (("&" and)?)
atom -> bool / paren / not
bool -> "true" / "false"
expr -> or
not -> "!" expr
or -> and (("|" or)?)
paren -> "(" expr ")"
start -> expr <epsilon>
val it : unit = ()

Map has sorted its elements, but the printed representations are as expected.


> parseExpression g "start" "true&true|false";;
val it : ParseResult * int =
  (Production
     [Production
        [Production
           [TerminalSymbol "true";
            Production
              [TerminalSymbol "&";
               Production [TerminalSymbol "true"; EmptyMatch]]];
         Production
           [TerminalSymbol "|";
            Production
              [Production [TerminalSymbol "false"; EmptyMatch]; EmptyMatch]]];
      EmptyMatch], 15)
> parseExpression g "start" "true|true&false";;
val it : ParseResult * int =
  (Production
     [Production
        [Production [TerminalSymbol "true"; EmptyMatch];
         Production
           [TerminalSymbol "|";
            Production
              [Production
                 [TerminalSymbol "true";
                  Production
                    [TerminalSymbol "&";
                     Production [TerminalSymbol "false"; EmptyMatch]]];
               EmptyMatch]]]; EmptyMatch], 15)
>

This shows how the precedence of the operators is defined explicitly in the grammar, rather than by annotating tokens or rules.

In part 2, I'll turn this into something usable.

The code for this article is on github.