вторник, 12 мая 2009 г.

Qt's DOM whitespace specifics

So I've just finished writing xliffmerge scipt that is direct analog of msgmerge@gettext and pomerge@translate-toolkit, but specially for XLIFF, i.e. it leverages all the advantages XLIFF brings, like saving information about each template update, specifying tha phase in which each unit was modified last time, and so on.

I was surprised to find out that translate-toolkit parses XML files into its own internal representation, which means they loose everything they don't support.

So to do it in a nice way, I chose to work directly with DOM representation of XLIFF file (using QDomDocument and friends). In Lokalize XliffStorage class is just a wrapper around QDomDocument.

I enabled preserving the whitespace, but in some cases it added additional whitespace, which means it modified user-editable text. The cases were clear: when no character data is between tags, it 'formats' them: inserts newline character + indent spaces. To override this behaviour I just added insertion of empty text nodes between tags. You can see the code in the end of xliffmerge.py (fixWhiteSpace*())

суббота, 9 мая 2009 г.

Scripting Lokalize

I started working on scripting support in Lokalize several months ago, but never really announced this because I wanted to write my own scripts and refine/extend API accordingly along the way.

KDE 4.3 will bring us 'officially' scriptable Lokalize. Among recent additions are new project wizard with anonsvn checkout support (screencast; it also checks out templates/ and scripts/ folders), actions to update .po from template and to compile .po into .mo and place it in local .kde folder so it could be picked up automatically by applications on the next start (see here and here for python sources).

Recently I also extended pology with ability to communicate to Lokalize, so now you can run xml checks against your translations and get erroneous files opened in Lokalize and positioned on appropriate entry.

So how it works: on each project open Lokalize scans PROJECTDIR/lokalize-scripts folder for .rc files and adds them to a 'cache' file called PROJECTDIR/lokalize-scripts/scripts.rc (so you shouldn't generally want to add it to version control system). RC files contain descriptions of scripts, particularly their paths. The path is relative to .rc file folder (or to a system scripts folder - they are tried both). So you for example can specify "../../scripts/lokalize/opensrc.py" to load script from global kde-l10n4 scripts folder (i.e. not specific to your language). Each script is represented with action in application menu (and you may assign a shortcut to it).

I wrote a script (in qtscript/javascript) to do a check specific to Russian language: %1 should be mentioned in _all_ plural forms, because, in Russian, the first form is also used for 21, 31 and so on. You can find it here. The script contains no immediate code, but only a function. The name of the function (fileSaved) is essential, because it matches the signal emitted by Editor object, which allows us to communicate with current editor tab in Lokalize. The function is automatically called on each files save, and if file contains errors, a warning is issued and wrong entries are shown. Scripts are actually don't get loaded until user explicitly triggers their action in the menu, so to override this behaviour RC file specifies <property name="autorun" >true</property>.

Below are links to API references. Everything marked as Q_SCRIPTABLE may be used from scripts
Editor object API reference
Lokalize object API reference
Project object API reference

You are welcome to write your own scripts and polish existing ones.

Lokalize screencast time

I did quite a few things during the last week.

In particular I finished implementation of storing XLIFF inline markup in translation memory and made autosubstitution mechanism more robust. Also I added support for binary units and displaying alternate translations (stored in xliff or from external file -- esp. useful for gettext po files which don't support storing alternate translations in the same file).

A screencast