I’ve decided to write an editor. Silly idea I know – there are already two out there: both vim and emacs are quite good. I’ve heard rumors that there might be others, but investigation always shows they are just toys, not real editors. They don’t support reading email, unless that is all that they do.
And I know that I’m supposed to be designing a new language: ocean. And I will… maybe. But how can one write code for a new language without a new editor (though one could equally wonder what language a new editor might be coded in if not a new language…). Ocean will have to wait.
So why a new editor? Well I really love emacs. Really. But I also hate it. I’ve tried programming in emacs and I just can’t manage it. It isn’t just the LISP (though that doesn’t thrill me), it is the low-level hacking at the attributes in the buffer to create the image that I want to display. It feels like assembly programming. It is probably a personal weakness on my part – other people seem to manage. But isn’t it always out of personal weakness that great genius emerges? I’m sure it is.
I’ve tried writing an editor before, as recently as last year and as long ago as my honours year at University (1986). Each time I failed to produce anything even vaguely useful. A big part of my problem is that I tend to let the perfect become the enemy of the good. Whether that is the real reason I’m not sure, but I know that fact is true, and I know that I consciously tried to fight it this time. And I’ve made a lot more progress in the last few weeks than I ever did before. So maybe I’ve overcome my personality flaw, or maybe I just stumbled onto a good idea. Which part of the design that good idea might be I’m not sure. So I’ll have to just write about all of it.
Let’s start with the name: edlib. This very deliberately pays homage to emacs. emacs is short for “editing macros” or something similar. It describes the technology used to build the editor. “edlib” is short for “editing library” – again it is the technology. edlib will (hopefully) become a library of sensible interfaces and useful implementations which can be combined to build an editor. Any editor. But “edlib” is also the name for a specific editor that I will build with this library. The interfaces are important. Creating clean abstractions that are combined through good interfaces will what will make (or break) edlib.
The general structure of the library follows the Model-View-Controller (MVC) pattern. This highlights what I really don’t like about emacs: the model is not well separated from the view. In edlib it will be. Today I will write mostly about the model.
Documents: the model of editable data.
Anything that is edited or viewed in edlib is stored as a ‘document’. A document combines a number of ideas:
- A list of characters. These are Unicode characters and are probably encoded with UTF-8, but each document can make independent implementation decisions there.
- Attributes on the characters. These are name=value pairs. Attributes apply to individual characters, not ranges of characters. The intention is that they record parse summaries or state information. Code that interprets attributes may have to look around to find them and may have to interpret them in non-trivial ways. For example apparent spelling mistakes might have an attribute attached to the first character in the word. It might list some possible corrections. A matching “view” module would notice that and highlight the whole word. A related controller would provide a way to select from the possible corrections.
- Marks identify locations in the document. Each mark is between two characters, or at the start of end of the document. Marks, like characters, can have attributes. Marks are more heavy-weight then just attaching attributes, but marks are easy to find: they are linked together and can be stored in separate data structures. Marks can identify temporary locations for communicating ranges between different modules. They can identify the parts of a document being displayed in a particular view. Or they can identify anything else.
- Undo/redo support would be integrated into the document. Each document keeps a list of recent changes, and can revert or re-impose them. All changes are described as a “replace”. Two marks (actually a point and a mark) identify a range to be deleted, and a string provides replacement. Either the range or the string can be empty of course.
- A “view” is a particular collection of marks together with a special mark called a “point”. The marks in a view are in their own list (and also in a global list) so they can be found from each other easily. When a change happens in the document, each view is notified and the preceding mark is mentioned in that notification. This allows each view to easily determine if anything important has happened.
To enable this, every change must happen at a “point”. A point is a special mark which exists one per view, and which is linked into the point lists for all views. A view which does not modify the document might not have a point. Any view which can modify must have exactly one point.
An obvious use of a view is to connect a document to a display window. The marks might be the start and end of the displayed region, or they might be the start of each line that is displayed. But there are other uses for views.
A “word count” view might keep marks every 100 lines in the file, and record at each mark the count of lines, words, characters from there to the next mark. Whenever there is a change, the counters at just one mark can be quickly updated. Whenever the counts are needed, a sum over the relevant marks will accelerate the count.
Another view might perform a spell-check on any changed text, and might set attributes to record the result. Yet another view could parse some code and keep a symbol table reasonably up-to-date.
Ultimately these different views could run in separate threads, particular if part of the task can be time consuming.
Together these ideas seem to form a powerful abstraction. It is fairly obvious how this would apply to a text document read from a file, and that would be the most common document type. Other possibilities include:
- A file could be memory-mapped, and might only allow replacement where the deleted region matches the size of the added text. This would allow even a block device to be mapped so that the editor could view raw data on a disk drive, and maybe even edit metadata that it has been taught the format of. A hex-edit view would be particularly appropriate here.
- Reading a directory into a document could involve creating a single character for each directory entry, and then attaching the name, owner, modes, type, date etc as attributes of that character. This directory could then be displayed in various way to allow selection of a file name to load, or to allow more general directory editing.
- A “virtual” document might itself provide a view on one or more other documents. This could be used to view a collection of files as just one, or could present a “diff” view of two documents, with common sections only displayed once between differing sections.
One awkwardness with the described directory document is that while it is easy for different displays to select different attributes, it is not clear how a different sort-order could be achieved. A “virtual” document on a directory could impose a separate sort order, using a list of marks with one mark for every directory entry. - The editor would almost certainly contain a list of active documents. This list itself could be stored as a document, as could other internal data structures which could usefully be viewed.
There are of course lots of other possibilities, but just those are enough to allow a very versatile editor to be built. A key property of these documents is that they are linear lists of characters. They have a first and a last and probably some in between. Marks are very linear too, being linked together in text-order. This does limit the range of documents to some extent. For example an image doesn’t really fit this model (though a video might). So I don’t really expect an image editor to ever be built with edlib. A mail reader, though, is an absolute must.
My implementation so far has just the one document type: a text read from a file supporting arbitrary edits and indefinite undo. Just recently I completed the clean separation of that from the rest of the code so that a second document type (probably directories) can now be implemented.
And no: the code isn’t available yet. It will be, but not until I want other people to look at it.