The naming of a language

“The naming of cats is a difficult matter, It isn’t just one of your holiday games.”  The naming of programming languages is also important.  As with any project a name is needed to be able to refer to, and it inevitably will set expectations and flavour to some degree.

I’ve had a few different thoughts about names.  My first idea was “plato”.  Plato was a philosopher and is particularly known for drawing a distinction between the real and  the ideal.  All things in the real world, circles and squares and so forth, are just poor shadows of the perfect circles and the perfect squares that exist in the ideal, or “Platonic” plane.

I actually think Plato had this backwards (though I haven’t read his work so quite possibly misunderstand him and  misrepresent his ideas here).  To my mind the ideals that we think about are poor approximations to the real world which, after all, is the reality.  The process of thinking (which is what philosophy tries to understand) involves creating abstractions that map on to real world objects and events, and in trying to find the abstractions that are both general enough to be useful, and precise enough to be truthful.

I see the role of a programming language being to fill exactly this gap.  It needs to address real world problems and tasks, but does so by generalising and abstracting and treating them mathematically.  In a sense the program exists in the Platonic plane while the implementation exists in the real world, and the language has to ensure effective communication between the two.

So “plato” is not a bad choice, but it isn’t the one I’m going to use.  I actually think “plato” would be a great name for a “platform” – like “Android” or “Gnome” or whatever.  They both start “plat”…

My next thought was to name it “Knuth” after Donald Knuth who has had some influence of my thinking as you will see in future articles. The naming of language after dead mathematicians has some history (with Pascal and Ada at least), but as Mr Knuth is still alive, using the name “Knuth” doesn’t fit that pattern.  And it would probably be a bit pretentious to try to use such a name for a little project such as this.  So that name is out.

While dwelling on my real motivation for this language, I realised that it really is quite strongly influenced by my personal experience of the last few decades of programming.  This should be no surprise, but it is worth acknowledging.  It is easy to pretend that one is being broad minded and considering all possibilities and creating a near-universal design, but that is only a pretence.  The reality is that our values are shaped largely by our past hurts, and these can only come from our past experience.  I must admit that I am escaping from something, and that something is primarily “C”.

I’ve used C quite happily since the mid ’80s and enjoyed it but have always been aware of deficiencies and it is really these that I want to correct.  I’ve watched other language appear and evolved and there have been good ideas but I’ve not found any really convincing.  Python has a lot going for it and I tend to use it for GUI programming, but when I do it just reminds me how much I like static typing.

So this language is to be my escape from C (at least in by dreams) and should be named as such.

C is seen to be a successor of B, which in turn grew out of BCPL.  So the joke at one time was to ask whether the next language could be “P” or “D”.  Of course it turned out to be “C++”, a joke of a different kind.  And then “D” came along anyway.

What do I want to call my successor of “C”?  The answer is easily “Ocean”.  Oceans are critical to life in many ways, but dangerous too – they need to be understood  and tamed.  Oceans are big and wide with many unknown and unexpected inhabitants.  If I want an arbitrary name for something else related to “Ocean”, I can use “Pacific” or “Indian” or “Atlantic”.  And of course an “Ocean” is like a “C”, but more so.

Having admitted that Ocean will follow on from C in some ways, I should explore a little what that means.

Primarily it means that Ocean will be a compilable language.  I’m not at all against interpreting  and JIT compiling but I don’t like to require them.  The runtime support code should not need to include an language parser, unless explicitly requested.  This means for example that a function like “eval”, which can be given some program text is completely out.  Similarly interpolating variable  names into strings with “…${var}…” is not an option.

Some degree of introspection is probably a good idea – I haven’t really decided yet – so it may be possible for a program to manipulate language objects.  But this must not be part of the core language and it should only exist as a library for programmers with particular needs who are willing to pay the cost.

It also means that the programmer should have a lot of control.  I’m not sure exactly what this means yet, but in general the programmer should feel fairly close to the hardware, and have an easy clear idea of when runtime support with help out and when it will stay out of the way.  Certainly the program should have a fairly clear idea about how their constructs use memory and use CPU.

Static typing is a “must have” for me.  This is essential for the compiler to be able to find bugs, and I trust compiler coverage a lot more than test coverage (though that is important too).  There is certainly room for runtime type flexibility such as variant records, or values which can be real or NULL.  These need to be available, but they should not be the default.

So that is what C means to me: static typing, compilable, and fine control.  And that is what “Ocean” must contain – at least.

Now to be fair I must address the question of whether and these early design decisions fit with my philosophy stated early – particularly aiding clarity and minimising errors.

Static typing is almost entirely about minimising errors.  By having types declared that the compiler can check, fewer mistakes will make it to running code.  The equally enhance clarity by making clear to the reader what type is intended for each value.

“Fine control” is sufficiently vague that it could mean anything.  I justify it by saying that it allows clear expression of precise low-level intention.

“compilability” really hinges on the  lack of “eval”, though static typing is often related.  “eval” effectively permits self-modifying code, and this is extremely hard for the compiler to assert anything concrete about at all.  So I feel fairly comfortable asserting that “eval” is a great way to introduce hard-to-detect errors, so it should be avoided where possible.  If some limited for of “eval” turns out to be particularly valuable, that can certainly be revisited when the time comes.

So while my language has no content, it now has a name: Ocean, and even a website: http://ocean-lang.org/.  Anything could happen next… but it will probably be something lexical.

Posted in Language Design | Comments Off

An exercise in Language Design

When I was doing my honours year in Computer Science (UNSW, 1986) I wanted to design a new programming language.  That would be a rather large project for an honours year and naturally it didn’t happen.  I have remained interested in languages, though for most of the time that interest has been idle.

I recently wrote some articles about languages for LWN and that has re-awoken my interest in language design.  While I had scribbled down (or typed out) various notes about different ideas in the past, this time I seem have have progressed much further than ever before.  It probably won’t ever amount to much but I’ve decided to try to continue with the project this time and create as concrete a design and implementation as I can … in my spare time.

As part of this effort I plan to write up some of my thoughts as blog entries, and publish some source code in a git tree somewhere.  This note is the first such entry and it presents the high level design philosophy that I bring.  Undoubtedly this philosophy will change somewhat as I progress, both in clarifying the ideas I present here and in distilling new ideas from all the reflection that will go into the design process.  I’ll probably come back and edit this article as that happens, but I’ll try to make such changes obvious.

Philosophy

I see two particular goals for a language.  This first is  allow the programmer to express their design and implementation ideas clearly and concisely.  So the language must be expressive.  The second is to prevent the programmer from expressing things that the didn’t mean to express, or which they have not thought through properly.  So the language must be safe.

There are a number of aspects to being expressive.  Firstly, useful abstractions must be supported so that the thinking of the programmer can be captured.  “useful” here is clearly a subjective metric and different abstractions might be useful to different people, depending on what they are familiar with.  Some might like “go to”, some might like “while/do”, others might like functions applied to infinite sequences which are evaluated lazily.  The language I produce will undoubtedly match my own personal view of “useful”, however I will try to be open minded.  So we need clear, useful abstractions.

The “what they are familiar with” is an important point.  We all feel more comfortable with familiar things, so building on past history is important.  Doing something in a different way just to be different is not a good idea.  Doing it differently because you see an advantage needs to be strongly defended.   Only innovate where innovation is needed, and always defend innovation clearly.  When innovation is needed, try to embed it in familiar context and provide mnemonic help wherever possible.

Being expressive also means focussing on how the programmer thinks and what meets their needs.  The needs of the programmer are primary, the needs for the compiler are secondary.  Often it is easier to understand a program when the constructs it uses are easy to compile – as there is less guesswork for the programmer to understand what is really going on.  So the needs of the compiler often do not conflict with the needs of the programmer.  When they do it is probably a sign of poor language design which should be addressed.  If no means can be found to improve the design so it suits both programmer and compiler, then the needs of the programmer must come first.

A key element of simple design is uniformity.  If various features are provided uniformly then the programmer will not be forced to squeeze their design into a mismatched mould in order to use some feature – the feature will be available wherever it is needed.  The  most obvious consequence of this is that built-in types should not have access to any functionality that user-defined types do not have access to.  It should be possible to implement any built-in type in the language rather than having to have it known directly to the compiler.

The are probably limits to this.  “Boolean” is such a fundamental type that some aspects of it might need to be baked in to the language.  However wherever that sort of dependency can be reasonably avoided, it should be.

The second gaol is preventing mistakes, and there are many aspects to this too.  Mistakes can be simple typos, forgotten steps, or deep misunderstanding of the design and implementation.  Preventing all of these is impossible.  Preventing some of them is easy.  Maximising the number of preventable errors without unduly restricting expressiveness is the challenge.

An important part of reducing errors is making the code easy to read.  In any writing, the practice of writing a first draft and then reviewing and improving it is common.  This is (or should be) equally true for writing a computer program.  So when reading the program, the nature and purpose of the algorithm and data should stand out.  The compiler should be able to detect and reject anything that might look confusing or misleading.  When reading code that the compile accepts, it should be easy to follow and understand.

This leads to rules like “Different things should look different” and “similar things should look the same”.  The latter is hopefully obvious and common.  The former could benefit from some explanation.

There seems to be a tendency among programmers and mathematicians to find simple models that effectively cover a wide range of cases.  In mathematics, group theory is a perfect example.  Many many different mathematical structures can be described as “groups”.  This is very useful for drawing parallels and for understanding relationships and deep structure.  However when it is carried across from mathematics to language design it does not work out so well.

For me, the main take away from my article – linked above – “Go and Rust – objects without class”, is that “everything is an object” and the implied “inheritance is all you need” is a bad idea.  It blends together different concepts it a way that is ultimately unhelpful.  When a programmer reads code and sees inheritance being used it may not be clear which of the several possible uses of inheritance is paramount.  Worse: when a programmer creates a design they might use inheritance and not have a clear idea of exactly “why”  they are using it.  This can lead to muddy thinking and muddy code.

So:  if things are different, they should look different.  Occam’s razor suggests that “entities must not be multiplied beyond necessity”.   This is valuable guidance, but leaves open the interpretation of “necessity”.  I believe that in a programming language it is necessary to have sufficient entities that different terminology may be used to express different concepts. This ensures that the reader need not be left in doubt as to what is intended.

Finally, good error prevention requires even greater richness of abstractions than clarity of  expression requires.  For the language/compiler to be able to catch errors, it must have some degree of understanding as to what is going on.  This requires that the programmer be able to describe at a rich level what is intended.  And this requires rich concepts.  It also requires complete coverage.  If a programmer uses clear abstractions most of the time and drops into less clear expression occasionally, then it doesn’t greatly harm the ability of another programmer to read the code – they just need to concentrate a bit more on the vague bits.  However that does make it a lot harder for the compiler to check.  Those lapses from clarity, brief though they may be, are the most important parts to check.

Unfortunately complete coverage isn’t really a possibility.  That was one of the points in my “A Taste of Rust” article.  It is unrealistic to expect any formal language to be very expressive and still completely safe.  That isn’t an excuse not to try though.  While the language cannot be expected to “understand” everything, careful choices of rich abstractions should be able to cover many common cases.  There will still need to be times when the programmer escapes from strict language control and does “unsafe” things.  These need to be carefully documented, and need to be able to “tell” the language what they have done, so the language can still check the way that these  “unsafe” features are used.  This refers back to the previous point about built-in types not being special and all features being available to user-defined types.  In the same way, safety features need to be available in such a way that the programmer can make safety assertions about unsafe code.

As the language design progresses, each decision will need to be measured against these two key principles:

  • Does it aid clarity of expressions?
  • Does it help minimise errors?

These encompass many things so extra guidance will help.  So far we have collected:

  • Are the abstractions clear and useful?
  • Are we using familiar constructs as much as possible?
  • Have we thoroughly and convincingly defended any novelty?
  • Does this benefit the programmer rather than the compiler?
  • Is this design uniform?  Can the idea apply everywhere?  Can we make it apply anywhere else?
  • Can this feature be used equally well be user-defined types and functions?
  • Does this enhance readability? Can the language enforce anything to make this more readable when correct?
  • Are we ensuring that similar things look similar?
  • Are there different aspects to this that should look different?
  • Can we help the compiler ‘understand’ what is going on in this construct?
  • Is this “safety check” feature directly available for the programmer to assert in “unsafe” code.

Not all of these guides will apply to each decision, but some will.  And the two over-riding principles really must be considered at every step.

So there is my philosophy.  I have some idea where it leads, but I fully expect that as I try to justify my design against the philosophy I’ll be surprised occasionally.  For you my dear reader I’m afraid you’ll have to wait a little while until next instalment.  Maybe a week or so.

Posted in Language Design | Comments Off

RAID – not just smoke and mirrors

My final talk at Linux.conf.au 2013 was about “md” software RAID.

Slides are here and video is here (mp4).

One take away, mainly from conversations afterwards, is that – there is a perception that – it is not that uncommon for drives to fail in a way that causes them to return the wrong data without error.  Thus using checksum per block, or 3-drive RAID1 with voting, or RAID6 with P/Q checks on every read might actually be a good idea.  It is sad that such drives are not extremely uncommon, but it seems that it might be a reality.

What does one do when one finds such a drive?  Fixing the “error” and continuing quietly seems like a mistake.  Kicking the drive from the array is probably right, but might be too harsh. Stopping all IO and waiting for operator assistance is tempting…. but crazy.

I wonder…

 

Posted in Uncategorized | Comments Off

Wiggles and Diffs at LCA

My second talk at LCA2013 – the first one accepted – was on “wiggle”, my tool for applying patches that don’t apply.  In the presentation I wanted to explain how “diff” works – as I then wanted to explain why one of the things that wiggle does is more complex that a simple  “diff”.  For this I came up with a simple animation that I presented as a series of “impress” slides.  Some suggested I make them into an animated “gif”, so I did.  And here it is (click for a higher-res version):

 

 

 

Animation of Diff algorithm

See slides for explanation

 

 

 

Among the useful feedback I got about wiggle:

  • UTF-8 support would be good.  This  only applies to the way it breaks strings into words.  Currently it only understand ASCII
  • Detecting patterns of “replace A with B” and looking for unreplaced copies of “A” in the original might be useful.

The slides in LibreOffice format are here and the recording of the talk is here

Posted in wiggle | Comments Off

Linux.conf.au – one down, two to go.

At linux.conf.au this week and as always it is proving to be a great conference. Bdale’s keynote on Monday was a really good opening keynote: very wide-ranging, very high level, very interesting and relevant, very pragmatic and  sensible.

One of this key points was that we should all just keep building the tools we want to use and making it easy for others to contribute.  The long tail of developers who submit just one patch to the Linux kernel make a significant contribution but wouldn’t be there if it was hard to contribute, hard to get the source, or hard to build the source.  With Linux all of these are relatively easy and other projects could learn from that … particularly the “easy to build” bit.

So let’s not worry about beating MS or Apple, or about claiming the year of the Linux anything.  Let’s just do stuff we enjoy and make stuff we use and share our enthusiasm with others.  If that doesn’t lead to world domination, nothing will.

For myself, I managed to get 3 speaking slots this year … makes up for not speaking for some years I guess.  My first was yesterday about the OpenPhoenux project – follow-on from OpenMoko.  It was very well attended, I got really good responses and positive  feedback.  I even managed to finish very very nearly on time.  So overall, quite a success.  I hope the next two (both tomorrow, Wednesday)  go as well.

You can view the  slides if you like, but they aren’t as good without all the talking.  Hopefully the LCA organisers will upload the video at some stage.

Posted in Uncategorized | Comments Off

Writing for LWN

I like to write articles for LWN.net from time to time. Recently I wrong about the recently announced “f2fs” file system (https://lwn.net/Articles/518988/) and might follow it up with a couple more reviews of other filesystems.
One challenge is thinking of – or finding – interesting things to write about. I’m not expecting my loyal readership to do my work for me and provide topics, and any suggestions about general areas of interest that might spark some idea for me would not be unwelcome….

Posted in Uncategorized | Comments Off

A new blog…

I’m moving my neil.brown.name site to a new home (off in the cloud with better network connectivity – thanks Orion) and so thought it was probably time to try out different blog software.

Previously I have been using a python script I hacked up myself based on something someone else had done.  That was fun and a good start to learning python, but the functionality was always minimal and while I often put up with minimal functionality in exchange for having built it myself there comes a point where I want to move on.

This time that point was comment spam.  You would think that having a one-of-a-kind blog with a posting mechanism which is unique in details (though obviously not in principle) would make the cost of building a spam bot to post comments would just not be worth the gain.  But it seems not.  Someone did start posting comment spam – not even interesting spam for the most part, just pointless junk.  Maybe it wasn’t even someone, may it was an AI bot which worked out how to post noise all by itself.  Though when I deleted all the posts it didn’t come back for a while, so that seems to suggest a human agent.

Anyway the span has been annoying and I thought about writing some sort of protection (probably simplistic registration where I have to approve the first post by each new registrant – I don’t get so many comments that that would be a problem).  But time is short and task lists are long so it never happened.

So as I’m setting up a new server I decided to try something new and took the safe option – wordpress.  It certainly seems to be widely used and actively developed, so it must be worth a try.  I don’t even know what it does to prevent comment spam. I suspect I’ll find out once I get spam and start looking into it – but I’m sure there must be something there.

Meanwhile I and my commenters will benefit from not having to use an obscure markup language and  can just focus on generating content.

Creating the blog with wordpress meant that I needed to give my blog a cute name, so after  about 2 seconds thought I chose “A Taciturn Disposition”.  This comes from Jane Austin’s “Pride and Prejudice”.

“Are you consulting your own feelings in the present case, or do you imagine that you are gratifying mine?”

“Both,” replied Elizabeth archly; “for I have always  seen a great similarity in the turn of our minds.  We  are each of an unsocial, taciturn disposition, unwilling to speak, unless we expect to say something that will amaze the whole room, and be handed down to posterity with all the eclat of a proverb.”

I’m not really sure if Lizzie Bennet is intending this epithet as a compliment or insult, and probably means both.  But it one that I would feel comfortable with.  I’m not much good at small talk but prefer talking about issues of substance.  Certainly I wouldn’t include topics of only passing interest in a blog – that stuff, if written at all, belongs on facebook or G+.  Blogs are for more amazing and  proverbial expositions, such as this one.

But anyway, here begins the new blog.  The old can still be found at http://neil.brown.name/blog/, while this one is http://blog.neil.brown.name/.

 

Posted in Uncategorized | Tagged | Comments Off