One of the things we've discussed before is how important the formal grammar specification is to the Gutenberg project. Blocks exist because when represented in their textual form we have a terse definition of how they should be written out and read in and what each element in that text means.
But as we know the real world is more constraining than our designs in theory are. That is, we have to make tradeoffs in order to make good ideas practical. In Gutenberg we have to consider these constraints and tradeoffs with the parser code that converts between the serialized representation of a post in post_content and the in-memory data structure that powers blocks.
The primary tradeoff in Gutenberg is performance vs. clarity. The official Gutenberg parser is written in a language designed to describe documents and document grammars. It's intentionally limited with the goal of making the semantics of the representation primary at the cost of producing an inefficient parser executable. We want a fast parser but we also don't want to lose the clarity that our explicit and formal grammar specification provides.
How do we do this? We take care to compare competing parser implementations to ensure that they conform to the specification. In this document I want to lay out the roadmap, plans, and important considerations we need to take when building out this Gutenberg subsystem.
Continue reading “Comparing Gutenberg Parsers”



