Structured statements and expressions in Ocean

Now that I have my general parsing worked out and understand how I want to use indents and line breaks to give a two dimensional structure to my language, I need to think about the details of some of the elements of the language. The part of a language that we use the most is the part which does stuff: statements and expressions. So that is where I will start, though particularly with statements.

As has previously been hinted, blocks of statements will have two different syntaxes that the programmer can choose between, one Python-like and one C-like.

Block -> : StatementList
       | { StatementList }

This is a simplified version of the grammar, but the idea is that a StatementList can follow a colon, or be enclosed in braces. In the first case the StatementList must be terminated by a decrease in indent. i.e. all of the StatementList must be on the same line as the colon, or indented with respect to the line containing the colon. When we find a line with the same indent as the line with the colon, we know the StatementList is finished.

In C, any expression can be used as a statement simply by appending a semicolon, and various thing that you might think of a statements – such as assignment – are syntactically expressions. In GNU C (i.e. C as implemented by GCC) you can even convert arbitrary statements to expressions by enclosing them in “({” and “})“. This synergy between statements and expressions can be very powerful, but can also become ugly and confusing. It works well for small statements placed where expressions are expected, but quickly breaks down as statements grow.

“Rust” takes this synergy to it’s logical conclusion and every statement is an expression – there is no difference between statements and expressions. So you can write things like

a = if some_test {
            some_value;
     } else {
            some_other_value;
     }

I personally find this becomes clumsy. Also as I wrote for lwn.net, it creates some oddities in how semicolons are handled.

“Go” deviates from C in the other direction and allows a much more limited interplay between statements and expressions. Assignments and increments become true statements, but in a few places were you would expect an expression, a simple statement is allowed as well. So:

while line, ok = readline(file); ok {
         do something with line;
}

allows a simple statement before the condition in a while. This is certainly useful, but seems a bit artificial. It solves many cases, but isn’t really a general solution to the problem.

I like allowing general statements in the head of a while and similar places, but I don’t want to make them look like expressions – I still want a clear distinction.

So a while statement will have two forms

WhileStatement -> while Expression Block
                | while Block do Block

The first form is the one we are familiar with, though it is more like “Go” than “C” as the expression does not need to be in parentheses. The second form allows for more complex calculation of the loop test.

Naturally the first Block in the second form must return a value somehow to guide the progress of the loop. This is somewhat like the “return” statement in “C” which can return a value from any point in a statement block. Using “return” in this “while” Block would be confusing as the normal expectation would suggest the meaning was to leave the containing function. Instead I’m planning to use “use” as a keyword. That is certainly open to change after experimentation, but it is close to what I want. So:

while:
     line = readline(file)
     use not line.matches("END"):
do:
     print line

is a contrived example which shows how the “use” keyword might be used.

A noteworthy aspect of this example is that the “condition” part is larger than the “body” part. One could go even further and imagine a body part which was simply “pass”. This would result in a loop much like C’s “do {} while()” loop. It may not be quite as good, as the terminating condition doesn’t syntactically stand out, but careful placing of the “use” statement can make it stand out well enough.

This generalises to having largish “condition” and “body” parts, thus allowing loops which exit in the middle and answering Dijkstra’s “n and a half loop” problem, as discussed particularly by Knuth in his Structured Programming with go to Statements.

In that article, Knuth also talks about “event indicators” which he credits to C .T Zahn. These are an attractive idea and fit quite well if we allow these expression statements in a switch construct.

switch:
      some code which can "use" multiple different values
case value1:
      code
case value2:
      code

Naturally the type of each of the values listed with “case” clauses must be the same as the types of values given in “use” statements. For “event indicators” Knuth suggests that the list of allowed events might be given at the top of the statement. This might make life easier for the compiler, but seems unnecessary. We could allow that if all of the “use” statements give currently undefined words, and if all of those words appear after a “case” tag, then the words are taken to be values in a new anonymous enumerated type.

These enumerated values would behave a lot like “goto” labels, in that they must appear once as a target and must be used, but otherwise do not need to be declared. Indeed, there is a lot of similarity between these “event indicators” and goto targets, and I would not consider adding a general “goto” until experimentation with the event indicators showed they weren’t enough.

In Knuth’s paper, the event indicators can be used in loops as well as non-looping statements. In “C”, a switch doesn’t loop, so we clearly haven’t attained complete functionality yet. Without detailing all the steps, all these above thoughts lead to the idea of a general structured statement with several components:

for  BLOCK or simple statement
[if | while | switch] BLOCK or condition
then BLOCK
do BLOCK
case VALUE BLOCK
else BLOCK

“for” provides preliminary code which is only run once. It may declare variables which remain for the entire statement, but it contains no “use” statements.
“if“, “switch” and “while” are much as you would expect and if they have BLOCKs, those BLOCKs contain “use” statements, the value of which guide the execution of the other components.
“do” is only present with “while” and just runs code providing the condition doesn’t fail.
“then” and “else” are run if the condition or BLOCK returns “True” or “False” respectively. As such, they are somewhat equivalent to “case True” and “case False“
the “case” stanzas allow the condition to return a variety of values. If none of the “case” values match, then the “else” clause is used if present.

If the values returned by the condition in a “while” are not Boolean, then it isn’t immediately clear which values cause the “do” block to run and the statement to loop. My current thinking is that if the condition does not return a value at all, then that is equivalent to “True”, in that the “do” block is run and the statement loops. If the condition returns any value other than “True”, then the loop aborts.

If a (“while“) loop starts with a “for” clause, then there can also be a “then” clause after the “for” and before the “while“. Despite its positioning it is run after the “do” part if the condition didn’t fail. This allows an effect similar to the C for loop:

for a = 0
then a += 1
while a < 10
do sum += a

There are aspects of this that feel clumsy, though I suspect they can be resolved.

Requiring the ‘then’ keyword isn’t good. If the condition is a block then it makes sense, but if it is just a condition, then it isn’t wanted. So:
```
if condition:
    statement
--or--
if:
    use condition
then:
    statement
```
A large switch statement has no obvious end. If a new line at the base indent level starts ‘case’ or ‘else’ then the switch statement continues, otherwise it is terminated. This fact has some precedent in that languages with an “elseif” keyword can have a list of clauses each beginning with “elseif” which is only really terminated by a line that starts with something else. That doesn’t seem to cause confusion, so maybe “case” with no termination won’t either.
The word “then” isn’t quite perfect for the “increment” part of a “for” loop. It isn’t too bad as one could say “while this do that and then the other” so “then” suggests something to be done a bit later. It might just take some getting used to.
I’m not even sure about “use“. I like the concept. I don’t much like the word. I really don’t want a bare expression where you might expect a statement and that could be syntactically confusing. I wonder if “select” would work, as it selects the case to exit through.

At this stage I am not planning on including “break” or “continue“. “break” can sometimes be achieved with “use false“. In a few contexts, “use true” will do the same as “continue“. There might be times where these directives are still wanted in the “body” of a loop, but I hope not so I will wait until I find some convincing evidence.

This certainly isn’t the final word on loops. At the very least I expect to add a “foreach” loop to work with some sort of “iterator” type. However I cannot really explore that until I have a type system.

For now, I have put together a simple interpreter which allows some experimentation with the loops and conditionals discussed here. Now I need to think of some interesting loops and selection code that I can try it out with.

The interpreter is in my ocean git tree (git://ocean-lang.org/ocean), or you can read the literate source code.

Structured statements and expressions in Ocean

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

OpenPhoenux