My “wiggle” program which applies a patch to a different version of the file needs to compute the shortest edit path between two texts, which is the smallest set of additions and deletions that must be made to one file so as to produce the other. From this edit path it is easy to produce a list of sections that are the same or are different, and wiggle uses this to work out what changes need to be made, and where to make them.
When the two files being compared are large – on the order of tens of thousands of words – this process can start taking a long time. I recently started experiencing such a case fairly often, so it was clearly time to optimize the code. I managed to find two optimizations that don’t change the result at all, and two others that degraded the result on large files in exchange for substantial speed improvements.
Continue reading