Haskell is not context free

October 10, 2012 ∞

(Part 2 of a series; see part 1.)

Haskell, like Python, uses indentation to indicate grammatical structure, so we should expect that it requires context-sensitive parsing.

Showing this is a little tricky because it’s not clear exactly what constitutes Haskell syntax. The Haskell 2010 Language Report “specifies” the indentation rules here and here, but it’s well known among Haskell insiders that no one implements the spec. For example, Augustsson said in 2009,

Implementing exactly Haskell’s rule for indentation is incredibly hard. In fact, no known Haskell compiler gets it right.

Think about it: the Haskell community, consisting largely of academic researchers concerned with mathematics, logics, theorem proving, pure functional programming, semantics-preserving compilation, type systems, and so on, is unable to implement their own specification. This is like a man wearing a tuxedo, walking around with toilet paper stuck to his shoe. For fifteen years! It would be funny if the same people weren’t convinced that monadic parser combinators solve all problems in parsing. As it is, I find it maddening.

Back to the point at hand. Since no one implements the spec, it doesn’t make sense to see whether the spec is context sensitive; that would be like flogging a dead horse, and, moreover, a dead horse that was never actually alive in the first place. However, the specification does provide a useful clue:

The layout rule matches only those open braces that it has inserted; an explicit open brace must be matched by an explicit close brace. Within these explicit open braces, no layout processing is performed for constructs outside the braces, even if a line is indented to the left of an earlier implicit open brace.

(Emphasis mine.)

Here we see that Haskell intends to use rules similar to Python, which ignores indentation within parentheses, brackets and braces. It was exactly this rule that forced parsing to be intertwined with context-sensitive lexing in Python.

Let’s assume for the moment that all Haskell compilers implement the same syntax (which is not the syntax of the Haskell 2010 Language Report), so we can look at a single implementation.

I’ll use GHC as my example. Here is the parser (which of course is not written using monadic parser combinators), and here is the lexer. Look closely at the lexer and you can see that it is maintaining a stack of lexical contexts that get pushed and popped at braces, and which interact with layout handling. In short, it uses the same strategy as Python’s official compiler, and by my previous argument, we can conclude that Haskell is not context free.