Complex documents in edlib

One of my core goals in developing edlib is to allow the display to be programmatically controlled: the content of a document is formatted dynamically as it is displayed, and so can respond to context (e.g. location of “point” can modify appearance more than just by displaying a cursor) and also so that the entire document doesn’t need to be rendered into a buffer, just the parts being displayed.

A natural (for me) extension to this idea was the possibility that the source of the display wasn’t just one single document – multiple documents could be blended. Various examples have occurred to me, though few have been implemented.

I have a “hex-view” mode which displays a document by showing each byte as hex. To edit such a document conveniently I would like to transparently replace a particular hex field, in the view, with a tiny text document which contains the individual hex characters.  I could then edit that and the hex value I entered would be decoded into a byte (or bytes) that would be stored in the main document. Thus the display is partly the main document, and partly the little entry document.

I imaging a “diff” mode which shows the difference between two documents. The part of the view which contains lines from either document could be a reference to that document.  This way, editing the lines in the diff would directly edit the original. The display, in this case, would contain parts of one document interleaved with parts of another.

I have an email reader that can display multi-part email messages. This effectively divides the email document into multiple sub-documents, decodes them as appropriate (e.g. BASE64, quoted-printable) and then displays a combination of these individual parts, with buttons to act on the parts, and some headers at the top.

Finally, this same email reader can display a list of summaries of messages.  This is a complex document in a different way (and I’m wondering if I’m confusing different sorts of documents together – which is why I writing this note).  The list of messages is actually a list of threads where each thread is a list of messages.  I sometimes want to display one line for each thread, sometimes I want one of the threads to instead display as a list of (non-archived) messages, and sometimes I want to only see the messages from a single thread, but I want to see all of them (including archived).  I currently implement this as complexity in the document, but maybe it should all be in the display.

Reality hits.

Most of these thoughts came about in early design.  I thought that I could easily have a display which showed parts of multiple documents and could easily move about on the virtual document simply by being in control of the view.  It turned out that I was wrong, or at least overly simplifying.

Key to understanding the link between documents and views is understanding how the “marks” work, particularly the cursor or “point” that each view holds in a document.  Each document has a set of marks which are kept in an ordered list.  Each mark has a (non-contiguous) sequence number so ordering between marks is easy to check.  The current cursor in each view is one of these marks.  Marks are also used for lots of other purposes to keep track of locations in documents.  This storage of marks has two particular implications for managing complex documents.

Firstly, a view can only display a single document – the document which owns the cursor mark.  It would be a substantial extra complexity for the cursor in a view to be allowed to move from one document to another.  Secondly, the view of a document cannot change the order of  elements in the document. Doing that would confused the ordering of marks – marks could have a different order in the view than in the document.

In some cases these limitation can be hidden by using a temporary auxiliary view. In the hex-editor case, a viewing pane that is just a few characters wide could be displayed over-laying the view on the main document.  This secondary view would display a different document – one which contains the individual hex characters.  Some deliberate action would be needed to place the view, and then to discard it again.  This could be conceptually like a pop-up window, but would not have a border and so would appear to be part of the main document.  A similar approach could be used for the ‘diff’ example.  A particular prerequisite for this approach is that the main document and the secondary document have the same appearance for some range of characters, so rather than mixing two documents together in one view, we display two different views but place one over the other so it looks like just one view.

In other cases the limitation can be removed by creating a virtual document.  This document has its own set of marks, and maps them to marks in the underlying document as appropriate.  The view just displays the virtual document, which returns content from one or more subordinate documents.  I have implemented a “multi-part” document which does exactly this.  It maintains a list of documents and concatenates them. This is how email messages are displayed.  Once I realized the necessity of this it was fairly easy to implement and it made the display of multi-path documents manageable at last.

One case for which there isn’t an easy solution is when there is a need to re-order content, such as when displaying the headers of an email message.  It is good to present headers in stable order, such as From, To, Subject, Cc.  The headers in the email document could be in any order.  A virtual document could split the headers into individual documents and recombine them, but I’m not sure it is worth the effort.  My current solution is to just copy the headers that I want into a fresh text document, and display that.  As I have no desire to edit the original, I don’t lose any functionality by doing this.  If I wanted to edit a document that was stored in the “wrong” order, I might need a more complex intermediate virtual document.

Problems

Now, at last, I get to the issue that I wanted to think through.  Several documents have selective visibility.  The multi-part document used for email will normally hide parts that probably aren’t interesting (e.g. HTML versions of messages), but it must be possible to display them if they are wanted.  I’ve already discussed the slightly more complex selective-visibility needed for the email summary list.  I also need selective visibility of the document which contains a list of all documents.  When performing “save-all”, I display this document in a view where documents that don’t need saving are not visible.  I currently have three different ways to implement selective visibility, and I think that it is more than I really want.

For the “modified only” view of the document list I have a view pane which captures the low-level commands for moving a mark (particularly “doc:step”) and cause it to step over any document that is not “modified”.  So if it finds the document pointed to cannot be saved, it transparently moves to the next document.  This makes cursor movement, display, and everything else “just work”.  This is probably the best approach.

For the multi-part document I currently store the “is it visible” state in the document.  This makes it a little awkward to get information about the invisible bits, but I found a reasonable way around that.  The main problem is that two views on the one email message will have to show the same parts, and that isn’t a good idea.

For the list of mail messages I have an in-between sort of approach.  The document manages all the movement of marks, but the view tells is how to choose.  When a movement command (such as “doc:step”) arrives at the view, the view adds a couple of values that are not normally used for that command.  “str2” is set to the thread-id of the thread of interest, and “xy.x” is set to 0 or 1 depending on whether that thread should be expanded in context, or should be the only thing displayed.  This works well and has a good division of labor, but feels like a hack.  Overloading fields isn’t really a problem – the fact that I have a limited set of fields in commands effectively requires that.  But this puts a limit on the sorts of views that are available – a limit imposed by the document.  That puts the decision in the wrong place.

Solution?

Ideally the document would know nothing about visibility.  That means the view needs to be able to efficiently skip over invisible content itself.  For “modified” documents it simply gets some attributes and checks them.  This isn’t very efficient, but there aren’t enough documents that you would notice.  For multi-part, the view could keep a list of marks for each part-start, and could attach visibility info to those – finding nearby marks given a point is designed to be fast.  For the email summary list, it would be necessary to be able to skip to a particular thread (rather than individually step over each message in each thread).  That could probably be done once, and a mark left behind.  Leaving marks at significant places certainly seems like part of the solution.

Another part might be to pass a predicate function to the “move” command so it can more quickly step over unwanted locations.  Having a pane call “step-forward” over and over again is not very efficient as the step command has to search down the pane stack to find the document.  If we pass a function to the step-forward command, it could call that at each location and keep moving until it succeeds.  This would have an added benefit of making “search” a lot faster.  Currently it is slow because the “step-forward” command to get the next character is slow as it hunts for the right command.  If that can be short-circuited with a callback, it might run much more smoothly.

Summary!

  1. complex documents where there is a base document and others which provide the same content from a different perspective should be easy enough, though they need some sort of deliberate action to activate the alternate document
  2. complex documents which each provide different content must use an intermediate virtual document.  I could even re-order things, but only be dividing the original up into separate pieces, then re-assembling them.
  3. views that control visibility must put all the decision making in the viewer, but might use special movement commands provided by the document (such as ‘move to next part of a multipart’ or ‘move to next thread’).  They can expedite things by leaving suitable marks around, and by passing a predicate function to the “move” command.

I think I now need to review all my movement commands, and how I get characters and attributes from a document.  There is probably room for unifying things there.

This entry was posted in edlib. Bookmark the permalink.