Thursday, December 21, 2006

Requirements for the file tagger

So I've finally grabbed enough time to work out what I think the file tagger application should look like in more detail, and thought I'd at least blog that much before going on -- as you can see, this isn't going to be an app a week yet (more like a 0.5 app-to-week ratio), and I have no doubt that Dana will think that's evidence of the superiority of the .NET platform, but you and I know he's full of it.

Here's how I envision it:

The central data structure of the application is a list of managed files. These files can actually reside anywhere in the file system; adding a file to the list or removing it has no effect on the actual file. This is essentially a database about the files. We can store the following information about each file:
  • Where it is (the filename and full path)
  • An optional nickname
  • An optional descriptive text
  • A list of tags for the file.
The application can run in command-line mode or in GUI mode. In command mode, you can add and delete files from the list of managed files, change their descriptive text, and import/export various things (the info for a file, the info for a tag, or the entire file database).

In GUI mode, the frame consists of a menu bar, a status bar, and a tab set; the tabs are "Cloud", and "Files". (This bit of the specification may well change after I've implemented the first draft of the program.)

The Cloud tab consists of the current display of the tag cloud; each tag is a link. The Files tab consists of a split screen; on the left is a bar specifying the tags displayed in the list on the right side. The right side is a columnar list with a subset of the information for each file (the name or nickname, at least some of the tags it belongs to, and the description. I guess.)

When I add a file to the database, a dialog needs to appear asking me for more information about it: the nickname, tags, and description. The same dialog is used to edit information about the file. I can add a file in one of two modes -- either the dialog appears immediately, or the file added is added with a provisional tag (e.g. "new-files") and I can go back and organize all the new files at once. From the command line, the second, "silent" mode is the default. From the GUI, I can specify my preference. Note that the monitor application that miechu wrote (see the Jedi forum) could invoke the file tagger in either mode, adding new files to the database automatically after, say, you download them.

Anyway, in the Cloud tab, when I click on a tag, the app opens the Files tab with a list of the files in that tag. I can then edit them, etc.

When I drag a file from the Explorer (or desktop, or whatever) into the application, it depends where I drop it. If I drop it when the Clouds tag is open, then the editor dialog opens for the file, allowing me to specify tags, unless I have specified silent adding. If I drop a file or files onto the Files tab, though, they're automatically added with all the tags currently shown on the Files tab (unless there's no filter active.) That way, when I've got a tag open, it acts just like a folder.

I should be able to drag files out of the app, too. That shouldn't be hard.

OK, what am I missing? Oh, yes -- import/export. Especially export. When I export a tag, I want to get a tab-delimited list of the files in it, say. Or the XML descriptions (optionally, either). And I also want to be able to export the entire tag cloud as a set of cross-linked HTML pages, so that each tag links to a page listing the contents of the tag. How that page should be formatted, I'm not sure -- I guess to make sense, each file should be able to link to some activity, but I'm open to suggestions as to what.

So. That's your challenge. As usual, my main block is that I have specific ideas for extensions to my usual wxPython toolset to make this sort of UI development quick and easy, but I've never had the time to finish those tools (and as we know, I never use a tool if I can make it first. If I knew how to whittle a laptop, I wouldn't even be posting this at all -- it's all I can do to resist rewriting Blogger from scratch before starting, after all.) So the process is delayed.

Incidentally, regarding my last post about deadlines -- when Sunday started, I had two very solid days of work ahead of me, but the deadline was actually Monday morning. I could live with a day's tardiness (translation customers really don't like lateness, but sometimes they just learn to live with it). But when I got into the work, I was totally in the Zone. I've been in the Zone when programming before -- when it all just flows, you instantly comprehend each problem, each snag, each bug, and every line you write works the first time. It's like a religious experience, except that afterwards you have something to show for it.

In the four years I've been translating for a living, Sunday was the first time I hit the Zone. My fingers could do no wrong; I made next to no typos and my typing rate was some unheard of number. I didn't need to look up any words (well, it helped that the source text was very well written and all about XML as used in SAP ABAP). It just flowed. I usually figure on 4000 words translation per day as a full day. My personal record for a day was, until Sunday, about 8000 words. I don't really know. But Sunday, I translated no fewer than 16,300 words. Literally four days of work -- by 2 AM I just wanted to know how far I could take it, and at 6:30 AM, against all expectations, I simply ran out of text to translate. I actually made the deadline.

All of which is a long-winded explanation (as you know, I'm incapable of any other kind) to explain why it is now Thursday and you haven't heard a peep out of me. When one does four days' work in one day, it takes time to recover. Also there were Christmas presents to wrap and still a few to obtain, packages to take to the post office to send out in time for Christmas (largely the presents for our friends in Puerto Rico, along with one package for Hungary which won't make it in time), holiday break camps to schedule with Parks & Rec for the kids, guitar strings to buy, that sort of thing. But now I'm back.

Anyway, Dana, you have your challenge -- can you write the app above in one day with .NET? (Ha, like I can write it in one day with wxPython -- I probably could, but I won't, not this week. I'm still ramping up.) Maybe you could smack me around a bit about that, since I seem to have been relatively unslappable this week.

More tomorrow or on the weekend -- I really want to spend more time with my toolset before posting an app. Specifically, I've been working for a while on a simple resource-like overlay for wxPython which will allow me to express the above UI as XML, then instantiate it as an app. Yes, I know there are such things already for .NET, as painting the screen is kind of the point of rapid app development -- and actually, there's one for wxPython, too. But I don't like it. So I'm writing my own, because by God it's open source and I can. So. More later.

Tuesday, December 19, 2006

Feeds have been fixed

Sorry for my disappearance, but have no fear - the feeds are fixed. I'll log on later tonight and smack Michael around a little bit.

Saturday, December 16, 2006

Whew! Busy, busy!

I've been trying to hit yesterday's job deadline today (sigh) and working with my Perl translation code to do it. My thinking is that this code is probably too specialized for general interest -- not to mention far too rough for public display -- but would anybody here be at all interested in one tech-geek professional translator's approach to language handling? If so, leave a comment and I'll see what I can do that doesn't make me look like a total linguistic amateur.

Now I'm only two days behind to hit a Monday deadline for another chapter of German-to-English translation (on ABAP programming in SAP, actually) and so I'm unlikely to post much more on the file tagger app until Monday or Tuesday. Just so you're warned -- I wouldn't want anybody to suffer! (Unduly.)

(Except Dana. He's left us totally in the lurch. So Dana should suffer.)

Thursday, December 14, 2006

Note to self

More posts, but shorter ones. Nobody comments on or links out of the windy ones...

Dana still missing, film at 11.

Wednesday, December 13, 2006

Week #2, a code snippet, and Dana mysteriously silent

So -- week #2, eh?

The target for this week was suggested on the Software Jedi forum (thanks to everybody who's vented their fertile imaginations over there and in comments on this blog -- as long as there's enough grist for one week's mill, I'll be happy). It is, and I quote, "some kind of file tagger app".

The idea is pretty simple: del.icio.us (and Flickr, and lots of places now) lets you organize your links by tag. The tags are arbitrary and you can get a groovy tag cloud to let you see what's most important. So the suggestion on the forum was: wouldn't it be neat if you could do that with the files on your own drive? Or maybe on a corporate shared drive or something? Well -- it doesn't really matter what you do with it, the idea is that it would be neat.

So that's the app for this week, 'cause it's easy, and I'm all about easy.

This naturally breaks down into three pieces:
  1. The drop part (or some other way of identifying files for inclusion)
  2. The file manager -- a simple document management function.
  3. The tag cloud generator and some means of selecting lists, etc., from the cloud.
When I first started on this, I thought it'd be fun to create a generic drop handler as an Explorer extension -- then I saw how un-fricking-believably difficult Microsoft has made that seemingly simple task. So forget that -- instead, I'll simply set up a link under SendTo, which is equivalent, and much easier. And anyway, while the file manager component is actually open, it will still be possible to drop files into it, because that's simple to code.

And I've written a little Perl prototype code to do the tag cloud formatting in HTML:
sub keyword_tagger {
my $ct = shift @_;

my $weight;
my $font;
my $sm = 70;
my $lg = 200;
my $del = $lg - $sm;
my $ret = '';
foreach my $k (sort keys %kw_count) {
$weight = $kw_count{$k} / $max_count;
$font = sprintf ("%d", $sm + $del * $weight);
$ret .= "<a href=\"wherever\"
style=\"font-size: $font%;\">$k</a>\n";
}

return $ret;
}
This code assumes that all the tags are the keys of a hash %kw_count, and the values in that are the number of posts or files or whatever for that tag. The parameter to the function is simply the maximum count value (already determined in an earlier pass, in this prototype).

Pretty straightforward stuff -- this is a quote from the code I'm now using for the keyword cloud for my own blog. So for app #2, I'll be translating it into Python (natch) for use on the desktop.

So that's a little progress. And in other news, Dana's been awfully quiet since I started posting actual challenges and code and stuff. Could it possibly be that he's a little intimidated by seeing a non-Microsofty being too Jedi-like for his comfort? Time will tell.... But the gauntlet is definitely thrown, Dana. We're waiting!

Tuesday, December 12, 2006

Special-purpose editor #1 complete and documented

Whew. It took some doing, because there were a few things I wanted to fix with my various freaky tools I use to do this well-documented programming, but the first in the series is complete.

Feast your eyes!

(Note: full documentation for my projects is on my own site; some of this stuff is probably going to be pretty long, way too long for this blog.)

Anyway, it's a pretty boilerplate wxPython program, with barebones user interaction. I have some stock code I use for menus, but it's rather crude and underdeveloped. Still and all, this thing does what it's supposed to do, and it took me longer to document it than to write it (thanks to my reworking my documentation toolset today. Sigh.)

Next up: the filetagger app! This suggestion from the forum allows you to toss files from anywhere into a simple document manager, tag them, and generate a tag cloud to see what's what. Should be cool. And the clock starts .... now! (I got a week, right? Right.)

Monday, December 11, 2006

Requirements for a custom text editor

Well, after that promising first post, I didn't want to disappear entirely, but after staying up until 5 AM trying to meet a deadline (using, ahem ahem, my custom-built text editor) I was pretty flat. I was never good at late nights.

Anyway, Dana seems to think this notion of the custom text editor is a kind of, I don't know, a specific challenge, I suppose. So OK, let's formulate it as one.

The task is this: I have a text file consisting of tab-delimited text in three columns. The first is a German phrase, the second a number which we can safely ignore for this, and the third is an English phrase which has been generated by a Perl script. That third starts with a question mark to denote that it is questionable -- and let me tell you, any machine-translated phrase generated by a script is quite questionable. Let me give you a couple of examples. Just two; I didn't actually save the initial translated guesses and I'm too rushed right now to reconstruct them. They're too wide to be comfortable in their original one-lined form, so I've broken the lines, with German, then English, etc.
Verzögerung Bodennaht Kontaktkühlung schließen am Sackabzug
?Delay bottom seam contact cooling close at bag takeoff
Verzögerung Bodennaht Kontaktkühlung schließen an der Öffnungsstation
?Delay bottom seam contact cooling close to of the opening station
OK. What I want to appear is still pretty ugly English, but it's technical machine output and space is limited:
Delay: close bottom seam contact cooling at bag takeoff
Delay: close bottom seam contact cooling at the opening station
(I've omitted the first two columns for aesthetic reasons here.) In these two samples, I've done the following: inserted a question mark, moved the word "close" to the beginning of its phrase, and quickly replaced "to of" with "at" (I could have made a better Perl script, but the point is that with the right editor, it is quicker for me to manually change some instances than to derive a rule to change them all, correctly, with no false positives).

In Word, if I highlight "close" and drag it to the new position, I end up with this:
Delay closebottom seam contact cooling  at bag takeoff
Delay closebottom seam contact cooling to of the opening station
This sucks mightily. It means I have to insert a space and take another out, by hand, and get the mouse to hit the right places very accurately to do so or move the cursor with the arrow keys.

Even worse, if I drop "close" in the middle of a word, I might end up with e.g. "Delay bot closetom seam" -- and then I have to undo, or retype.

Over many, many drag operations this all adds up and it breaks my concentration. A very bad thing indeed. So my primary need for this new editor is that: mouse selection should select only words, no spaces, and a drag within the sentence must preserve word spacing (that is, spaces should remain around the words dropped, and no double spaces may be introduced.) A drop must also always preserve wordness, that is, if I drop a word in the middle of another word, it should insert the dropped word in front of the one I dropped it into.

And while I'm at it, let's throw in a couple more pretty simple requirements -- let's list them all, in fact:
  • Drag and drop of words/phrases must preserve interword spacing
  • Drag and drop must preserve wordness of the dropped text
  • The capitalization of the sentence must be preserved (initial cap)
  • Sentence-internal capitalization if I hit a key (e.g. F2 on a word toggles capitalization)
  • A single key jumps to the next uninspected phrase (using the initial question mark)
  • A single key on the left hand deletes the current selection
That last could use a little explanation. If I use, say, F3 with my left hand to go to the next question mark, and my right hand is on the mouse to drag and drop words around the sentence, then I don't want to have to move either hand to hit the Del key. So let's define F4, next to the F3 key, as a supplemental delete. F2 toggles capitalization on the word the cursor is in.

Got it? That's easy, eh? My own solution with wxPython was maybe 50 lines of code and took me a morning's worth of concentration, along with some ongoing tweaking in the afternoon and evening while using it. It's still a bit squirrelly; the in-sentence capitalization preservation doesn't quite work right and capitalizes things overzealously, for instance, but the remaining bugs were all easy to work around while using it, and so I was no longer motivated to make it perfect.

While documenting it tomorrow, I may try to clean that up a little. And since I built it into the skeleton of another project I had sitting around, I want to factor out the code specific to this task, too.

The example as written is invoked from the command line and needs no menu at all -- it saves its file when it closes; no option to quit without saving, and it doesn't need a File|Open either, since it already has the file to open when it starts up. So it's quite bare-bones, but it does the job.

After doing this, I gained a new appreciation for something carpenters and metalworkers call a jig. A jig is a sort of throw-away tool used to hold a workpiece in place while you're cutting or drilling. This kind of scripting is a software jig, effectively: it's a special-purpose tool which may never be reused for much beyond its precise original purpose. I'm not even sure I'll be able to reuse it for most translation projects, as this text was pretty special in nature. But it was quicker to write it, then use it, than it was to do without. So it's a win, and it may come in handy later.

Tomorrow I'll show you how I did it. Tonight I still need sleep.

Thursday, December 07, 2006

*pfft*, *pfft*, is this thing on?

So it's come to my attention that the whole app-a-week thing needs a little help. And far be it from me to say that Microsoft junkies are, you know, all talk and no go, because Dana sure beat hell out of that theory, but it looks like I'm all they can find. And I ain't your father's Microsoft man.

Dana here, I'll be writing in red from here on out. Sure, blue is more "jedi" like, but I'm not one to conform.

Welcome Michael!


Oh, sure, I used to do the Microsoft thing (back in the 80's when we had to carve our apps from stone and raw machine code, then hand-encode them on the toggle switches on Windows 3.1's front panel). In the 90's, though, I found paid employment in this new-fangled Internet thing, saw how comparatively easy it is to slap a form onto a browser from clean code on the server, and never looked back.

Well -- until I did, of course. I still run Windows on my desktop (some habits die hard) and I still do the occasional desktop coding work. Nowadays I usually use wxPython for GUI work, and for general hacking around on textually oriented or XML-based data, I love Perl, the original write-only language, whether I'm using it on Unix or Windows. I can sling that stuff out pretty quick. Maybe not an app a day (because let's face it, I'm not suicidal, plus I'm 40 years old with a family and my own business and can't spend hours a day coding these days) but I think maybe I can manage an app a week.

Oh -- another bit of disclosure. While I used to code daily for a living, in about 2002 the recession forced me to the realization that my management skillz suck. I was losing money fast because I don't know how to run a software consultancy sole proprietorship (in the 90's it was easy, people just threw money at me and I did stuff). Fortunately, my language skillz, unlike my management skillz, are pretty good. Over the course of the next couple of years, I built a technical translation business from scratch, and that's where my money comes from now. Which is a roundabout way of saying that if I disappear for a few days, it's because I'm working hard on a translation job with a tight deadline -- and they all have tight deadlines, let me tell you.

No problem, here at AnAppAWeek.com we're used to people disappearing!

But be that as it may, here I am. We'll see how long I last. I might actually turn my hand to this .NET stuff you young people keep talking about, but probably not. I'm not convinced it's good for more than flashy desktop toys. Not to mention that .NET may go away tomorrow and Microsoft would just grin at the look on your face. Python and Perl, though, are here to stay, and they're supported by more than just one monopolistic group of people.

Oh yeah, .NET may just go away tomorrow while Microsoft grins... That's as likely as the world abondoning Python. Don't be rediculous.

I've started noodling on a first weekly app, but unfortunately (true to form) the last week has been grueling from a translation work perspective. Interestingly, just today I wrote my own specialized text editor in wxPython. Tell me you can do that in .NET, and I may believe you, but probably not. I'll post details about the text editor as soon as I have time to write it up -- in the meantime, though, I have to use it to finish this translation job.

Aha! A challenge. Please, give details!

And on that note, I close. Dana, I think you're going to have a run for your money if you still want to say that Linux guys can't hack this Jedi thing.

I'll beleive it when I see it. So let me end this post by repeating myself. Linux guys can't handle AnAppADay. AnAppAWeek? Maybe... we'll see.