Monday, December 11, 2006

Requirements for a custom text editor

Well, after that promising first post, I didn't want to disappear entirely, but after staying up until 5 AM trying to meet a deadline (using, ahem ahem, my custom-built text editor) I was pretty flat. I was never good at late nights.

Anyway, Dana seems to think this notion of the custom text editor is a kind of, I don't know, a specific challenge, I suppose. So OK, let's formulate it as one.

The task is this: I have a text file consisting of tab-delimited text in three columns. The first is a German phrase, the second a number which we can safely ignore for this, and the third is an English phrase which has been generated by a Perl script. That third starts with a question mark to denote that it is questionable -- and let me tell you, any machine-translated phrase generated by a script is quite questionable. Let me give you a couple of examples. Just two; I didn't actually save the initial translated guesses and I'm too rushed right now to reconstruct them. They're too wide to be comfortable in their original one-lined form, so I've broken the lines, with German, then English, etc.
Verzögerung Bodennaht Kontaktkühlung schließen am Sackabzug
?Delay bottom seam contact cooling close at bag takeoff
Verzögerung Bodennaht Kontaktkühlung schließen an der Öffnungsstation
?Delay bottom seam contact cooling close to of the opening station
OK. What I want to appear is still pretty ugly English, but it's technical machine output and space is limited:
Delay: close bottom seam contact cooling at bag takeoff
Delay: close bottom seam contact cooling at the opening station
(I've omitted the first two columns for aesthetic reasons here.) In these two samples, I've done the following: inserted a question mark, moved the word "close" to the beginning of its phrase, and quickly replaced "to of" with "at" (I could have made a better Perl script, but the point is that with the right editor, it is quicker for me to manually change some instances than to derive a rule to change them all, correctly, with no false positives).

In Word, if I highlight "close" and drag it to the new position, I end up with this:
Delay closebottom seam contact cooling  at bag takeoff
Delay closebottom seam contact cooling to of the opening station
This sucks mightily. It means I have to insert a space and take another out, by hand, and get the mouse to hit the right places very accurately to do so or move the cursor with the arrow keys.

Even worse, if I drop "close" in the middle of a word, I might end up with e.g. "Delay bot closetom seam" -- and then I have to undo, or retype.

Over many, many drag operations this all adds up and it breaks my concentration. A very bad thing indeed. So my primary need for this new editor is that: mouse selection should select only words, no spaces, and a drag within the sentence must preserve word spacing (that is, spaces should remain around the words dropped, and no double spaces may be introduced.) A drop must also always preserve wordness, that is, if I drop a word in the middle of another word, it should insert the dropped word in front of the one I dropped it into.

And while I'm at it, let's throw in a couple more pretty simple requirements -- let's list them all, in fact:
  • Drag and drop of words/phrases must preserve interword spacing
  • Drag and drop must preserve wordness of the dropped text
  • The capitalization of the sentence must be preserved (initial cap)
  • Sentence-internal capitalization if I hit a key (e.g. F2 on a word toggles capitalization)
  • A single key jumps to the next uninspected phrase (using the initial question mark)
  • A single key on the left hand deletes the current selection
That last could use a little explanation. If I use, say, F3 with my left hand to go to the next question mark, and my right hand is on the mouse to drag and drop words around the sentence, then I don't want to have to move either hand to hit the Del key. So let's define F4, next to the F3 key, as a supplemental delete. F2 toggles capitalization on the word the cursor is in.

Got it? That's easy, eh? My own solution with wxPython was maybe 50 lines of code and took me a morning's worth of concentration, along with some ongoing tweaking in the afternoon and evening while using it. It's still a bit squirrelly; the in-sentence capitalization preservation doesn't quite work right and capitalizes things overzealously, for instance, but the remaining bugs were all easy to work around while using it, and so I was no longer motivated to make it perfect.

While documenting it tomorrow, I may try to clean that up a little. And since I built it into the skeleton of another project I had sitting around, I want to factor out the code specific to this task, too.

The example as written is invoked from the command line and needs no menu at all -- it saves its file when it closes; no option to quit without saving, and it doesn't need a File|Open either, since it already has the file to open when it starts up. So it's quite bare-bones, but it does the job.

After doing this, I gained a new appreciation for something carpenters and metalworkers call a jig. A jig is a sort of throw-away tool used to hold a workpiece in place while you're cutting or drilling. This kind of scripting is a software jig, effectively: it's a special-purpose tool which may never be reused for much beyond its precise original purpose. I'm not even sure I'll be able to reuse it for most translation projects, as this text was pretty special in nature. But it was quicker to write it, then use it, than it was to do without. So it's a win, and it may come in handy later.

Tomorrow I'll show you how I did it. Tonight I still need sleep.

0 Comments:

Post a Comment

<< Home