вторник, 12 мая 2009 г.

Qt's DOM whitespace specifics

So I've just finished writing xliffmerge scipt that is direct analog of msgmerge@gettext and pomerge@translate-toolkit, but specially for XLIFF, i.e. it leverages all the advantages XLIFF brings, like saving information about each template update, specifying tha phase in which each unit was modified last time, and so on.

I was surprised to find out that translate-toolkit parses XML files into its own internal representation, which means they loose everything they don't support.

So to do it in a nice way, I chose to work directly with DOM representation of XLIFF file (using QDomDocument and friends). In Lokalize XliffStorage class is just a wrapper around QDomDocument.

I enabled preserving the whitespace, but in some cases it added additional whitespace, which means it modified user-editable text. The cases were clear: when no character data is between tags, it 'formats' them: inserts newline character + indent spaces. To override this behaviour I just added insertion of empty text nodes between tags. You can see the code in the end of xliffmerge.py (fixWhiteSpace*())

3 комментария:

Fabien комментирует...
Этот комментарий был удален автором.
Fabien комментирует...

(note : xml tags are not allowed in comments so I used [ and ] instead... )

This is valid according to the w3c xml specs.

[blah] something [/blah]
is a synonym a [blah]something[/blah]

Another thing not well known is that
[blah][/blah] is a synonym of [blah/] and this contains the "" (empty string) value, and NOT null as many would expect.

Asgeir Frimannsson комментирует...

Related to this, XLIFF 1.x does have some strange handling of whitespace, as described here:


Hopefully we'll get this improved for 2.0.