Many programming languages are essentially one dimensional. The parser treats them simply as a linear sequence of tokens. A program could all be written on a single line, or with each token on a separate line and the parser or compiler wouldn’t notice the difference This set of languages includes Algol, Pascal, C, Rust and many others.
Some languages are 2-dimensional in a bad way. FORTRAN is probably the best example, though BASIC is similar. These (at least in their early forms) had strict requirements as to what can go on one line and what needs to go on a separate line.
A few languages are exploring the middle ground. Go will treat Newlines like semi-colons in certain cases which can result in a 2-dimensional feel, but brings with it some rather ad-hoc rules. Python probably makes the best attempt at 2-dimensional parsing of the languages that I have looked at. It allows newlines to terminate statements and also uses indents to indicate the grouping of some language elements, particularly statement groups.
While I find a lot to like in Python, it seems imperfect. I particularly dislike using a backslash to indicate line continuation. You can avoid this in Python by putting brackets around things as a newline inside brackets is ignored. But this feels like a weakness to me.
As I wrote in a recent article for lwn.net:
The recognition of a line-break as being distinct from other kinds of white space seems to be a clear recognition that the two dimensional appearance of the code has relevance for parsing it. It is therefore a little surprising that we don’t see the line indent playing a bigger role in interpretation of code.
This note is the first part of a report on my attempt to translate my intuition about parsing the two dimensional layout into some clear rules and concrete code. This note deals with indents. A subsequent note will look at non-indenting line breaks.
Continue reading →