The Able project is coming along slowly. The language spec is fairly well complete, and I am roughly 50% done with the grammar and parser.

The biggest challenges so far have been:

  • Handling left recursion in Arborist, the PEG parser combinator library that I’m using to implement the grammar and parser for the Able language. There have been a number of edge cases that I hadn’t expected, but they are now dealt with. See for further detail.
  • Writing well performing grammar rules that represent the various unicode character classes. I added a new concept to the PEG formalism - a mutually exclusive alternative, or MutexAlt for short, in which alternatives are delimited by the pipe operator, “|” and it is guaranteed that only one of the alternatives may match any given input. For example: Ascii <- “a” | “b” | “c” | …

    Internally, the MutexAlt is represented as a Set(String), and a substring of the document being parsed is deemed to match if it exists in the set. See for further detail.
  • Slow parsing performance has been a persistent issue, however, the biggest performance improvement came with the introduction of memoizing grammar rule applications at particular positions within the document being parsed. Prior to memoization, backtracking may result in the re-evaluation of the same grammar rule at the same position in the input document. With the introduction of memoization, grammar rules will only be applied once at any given position within the input document. The one caveat is in the presence of left-recursion. Within a left-recursive rule application, memoization is disabled for the duration of the seed-growth phase and then later re-enabled after the seed growth is complete. See the implementation of Apply#eval at for further detail.
  • Grammar writing is hard. It just is. It is more an art than a science. It is slow-going. It is definitely one of those things where a test suite is essential. Without a test suite, it is extremely likely that you will accidentally break the recognition of one aspect of your language while you are trying to recognize a different aspect of your language.