As edlib (my Emacs-replacement editor) matures I’m being more adventurous in the functionality I’m adding: spell checker, calculator, difference highlighter. This often involves importing functionality from an external source and making it available within edlib. This has raised the question of how to map an ad hoc interface from some library into the more constrained interfaces supported within edlib. Experience so far suggests that it is always possible, but there are sometimes multiple options and it is worth making the effort to choose carefully.
It isn’t always necessary to map some imported functionality into an ‘edlib’ API. Sometimes it can be entirely consumed within the one pane. “Notmuch” is a suitable example here. Notmuch is a tool for indexing email archives and a library for accessing both the index and the individual emails. I have (under development) a module that uses notmuch to create some virtual documents for edlib. One exposes a list of saved searches, another exposes the results of a given search, and a third exposes an individual email item. There are, I guess, some questions about how display panes for these documents communicate about the substructure of the documents – expanding threads within search results for example – but these are not new questions. Any document has substructure whether lines and words, or statements and expressions, or threads and messages. Each will need different details in the navigation interfaces, and these may involved design changed, but not the sort of design challenged that I wanted to discuss here.
The edlib interface
The challenge I wanted to focus on was mapping the edlib interface to some other interface. Within edlib, all calls between panes (panes are the primary locus of state and computation in edlib) have a strict form. The call passes a prescribed set of arguments: two numbers, two strings, two marks, one pane, and one command; and returns a single number which is positive for success, negative for an error, or 0 meaning that there is no such command.
This form, particularly the single return value, might seem overly restrictive, but the “command” is effectively a wild-card that can be used to achieve almost anything. The design challenge is to use this command judiciously so that the code is easy to follow. The best was to see what this means is to examine some concrete examples.
Spell checking with aspell
The aspell library provides a simple interface to check if a word has valid spelling, provide a list of valid spellings similar to a given word, or add words to a private dictionary. I could simply link this library in with any pane that has any interest in spelling, but that approach has a concrete problem, apart from any aesthetic problem I might have with it. aspell maintains per-document state, particularly a list of words that are acceptable for a given document, but are not part of a shared dictionary. If various panes are to have access to this per-document state, there needs to be an aspell pane associated with each document, and other panes must communicate with it, using the edlib interface.
Checking if a given word is a valid spelling presents no challenges. The target document can be provided using the “pane” argument, the word can be in one of the strings, and the returned value can indicate if the word is a valid spelling. Similarly adding a word to a dictionary is easy as there are few argument and at most a simple success status to return.
Returning a list of alternate spellings is a little more complex. Returning values other than integers is common enough that both the Python and C APIs for edlib have integrated support for a variety of types. A string, a mark, and pane, or a command can be returned by passing a standard command to be used as a call-back. The called function calls back with the string, mark, pane, or command, and the API makes this easily available to the caller. Returning a list of alternate spelling builds on this.
The command passed as an argument, like all commands in edlib, implicitly contains a pointer to private data. A caller that wants a list of alternate spelling allocates an appropriate data structure (possibly on the stack), associates it with a command that adds a given string to this structures, and then passes this command+data association to the “aspell-suggest” interface. This will call the command multiple times and all the suggestions can be collected or evaluated in whatever way the caller likes.
A numeric calculator
A calculator, which evaluates arbitrary numeric expressions, doesn’t have the excuse of per-document state to justify need to be available via the edlib interface. In this case I think my argument is encapsulation. The calculator I have isn’t part of a standard library – it is something I wrote myself. So I need to encapsulate it in some way so that it is easily accessed. Given the context, providing an edlib interface seems the obvious step.
Passing values around is most easily done using strings. This means converting to and from string representations of numbers a lot, but that isn’t a problem (yet) as this calculator is intended as a convenience, not a target for high-performance computing. So the basic “calculate” interface receives the expression in one string, and returns a string result as described above. The simple case is, indeed, simple.
Returning the result “as a string” hides some complexity. Should that string be in decimal or hexadecimal or some other base? If not an integer, should a precise fraction be returned, or an approximate decimal expansion. Are exponents allowed? To some extend these questions can be answer by using the second string to pass a number of flags to request a particular format. In my main use case I want multiple formats – both decimal and hex are displayed by default. Rather than performing the calculation twice, I want to perform it once and get both results. In much the same way that the spell checker could return multiple suggestions, the calculator can return the answer in multiple formats: all that are requested.
It is useful for a calculation to be able to refer to earlier results, so allowing pronumerals is valuable. This means that the calculator needs to be able to look up a given pronumeral to get the target numeral. This too can be done with the command. I glossed over a detail when I listed the argument to an edlib command earlier. There are actually three strings, not just two. The third string is the command name or “key”, and is usually used to find the command in a map. When that happens, a given command will always get passed a key which is its own name. When a command has been explicitly provided, rather than looked up, the key can have any value and so can act as a third string argument. So the calculator uses a different key when calling back with each different format of the result, and uses another key (“get”) to ask for the numeral for a given pronumeral.
As you can see, overloading the command is quite powerful. It can be used to collect extra arguments, and to provide arbitrary complex return value. Too much complexity might get confusing though. I don’t really have a case of something getting too confusing yet, so I cannot give an example. I just feel the need to be cautious.
The command isn’t the only argument that can be extended like this, and it might be best to balance the extensions across the arguments. The word-diff library give some ideas in this direction.
Word-diffs and wiggles
My “wiggle” tool can split two texts into words and find which words are common, and where other words have been inserted or deleted. It can also take three texts and attempt to align them so that differences between “before” and “after” can be merged into an “original” text. This functionality is useful when working with patches, and generally with any textual revision-control system.
My justification for exporting this functionality via edlib interfaces is that it is most useful when tightly integrated with other edlib objects. As wiggle works with texts, and as edlib documents are often texts, it would be best if wiggle worked directly with edlib documents. The result of wiggle’s work is to flag some text as inserted or deleted or unchanged or similar. edlib allows attributes to be attached to text in documents. So the natural interface for wiggle is to give it some text using documents and marks, and have it record its results by setting attributes on that text.
For “diff” which requires two texts, we could get away with using the simple interface if both texts are in the same document. This document would be provided using the pane argument, the start of the texts can be given as the two marks, and the lengths of the texts as two numbers. This is sufficient for highlighting word-differences when viewing a patch, but not much beyond that.
Marks in edlib come in three flavours: simple marks, points, and grouped marks. Groups of marks can be created easily and each mark in a group is linked to all the others. So passing a group of marks to a command is easily done: just create the group and pass one of the marks. Marks (like panes and document contents) can have attributes associated which are name=value pairs. So passing a single mark can not only pass multiple marks, but also multiple strings. My current wiggle code expects a mark to be part of a group of six, with each mark having attributes saying whether it is the start or end of the before, after, or original text. This achieves passing in lots of arguments without overloading the command argument.
Marks can also be used to return multiple strings. The “Word-count” module does this, counting lines, words, and characters from the start of a document to the given mark, and the setting attributes on the mark with all these values.
This provides nearly all that I need for wiggle/diff, but not quite. It is sufficient if all the texts are in the one document, but if I have two documents and want to flag matching regions within them, I haven’t yet provided a nice solution. I need to pass two panes, not just one.
Clearly I could use the command here and have the wiggle code call back to ask for the panes that refer to the documents, but I’m not sure I want that. There are some other options that I can explore.
Linking panes
Similar to how marks can be linked into a group, panes can be linked together. This is done using notifiers. If I have two panes I can link them so that notifications set to one are delivered to the other. I can almost picture using this to effectively pass a group of panes by only passing a single pane argument – but only almost. I would need to allocate a temporary unique notification name, and I need to ensure the notification was handled properly. I could certainly be done. I’m just not sure that I want to do it.
Commands as objects
Another option is to build on the power of commands in a different way. As a command is always (potentially) associated with data, it can be treated like a generic object in the “object-oriented programming” sense. The search code takes this approach.
The “make-search” command takes a string representing a pattern, and returns a command that can be used to search for precisely that pattern. That returned command can then be passed to the “doc:content” command which causes a document to deliver its content, one character at a time, to the given command. When the search command finds a match it signals “doc:content” to stop. The search command object can then be queried to find out where the search completed, how long the match was, and where any captured substrings can be found.
In a similar way, the wiggle module could be asked to create a command that connects two panes. A “make-wiggle” command might return an object-command which remembers the “original” document. A call to that object-command might provide it with a “patch” document. Other calls could pass in different marks and the command could then do the appropriate “wiggle” work. The same object-command could be reused for aligning various different hunks from the “patch” with different locations in the “original”.
This object-command approach is clearly the most powerful, and anything can be achieved through it. As with anything powerful, it needs to be used carefully. I don’t yet know exactly what that means. Hopefully as edlib continues to grow I’ll get more examples that can show effective patterns for using object-commands, as well as parameter-passing commands and other techniques for building interfaces within edlib.