Camen Design Forum

NoNonsense Forum translation support—How?

append delete Kroc

I’ve heard word from an NNF contributor deploying NNF internally at his company that he has a requirement for translation support.

Now, supporting translation is a *lot* harder than it seems when you give the implementation some thought, and I’m interested in people's input given the following consideration:

* There is no session (and can't be). How do we both provide the choice of language to the user [UI], and how do we persist this choice? Off the top of my head, I would consider passing a query string around to set the language, but this might get lost with POSTs and redirects. The language choice could be part of the URL scheme and parsed out by the .htaccess (possibly even setting an environment variable); but there may be another way of doing it

* Because NNF does not store any data on users except their password, there isn't any currently existing way to persist a per-user choice of language — they would have to choose it each time they visited the site. Cookies seem like the only answer to this at this time. It’s not really okay to require a user to sign-in to get their desired language because they may not be able to read the link to sign in

* Providing language translations bundled with NNF could restrict my ability to develop the product further due to reliance on contacting X-number of volunteers to translation new strings whenever I prepare a change. I like the speed at which I can push out new versions, and I wouldn't want to go down a route that leaves my hands tied especially when I am neither being paid for any of this, and can’t pay anybody either. For that reason it’s unlikely NNF will contain many translations out-of-the-box, but it is vital that the implementation allows people to provide their own languages *without modifying the original source files* (i.e. via config.php / theme.config.php and additional external files not tracked in the GIT repositry)

* What is the right way to translate within NNF? Search and replace? By doing that, you have to duplicate the template's text somewhere else, many times over (for each translation to find the original English text), meaning that to change the English creates a search/replace nightmare. What about duplicating the template files for each language? — everything needed is certainly contained within in clear context (easy to do the translation), but it’s a very large amount of duplicate data and too much of a pain if you want to make HTML changes. Choosing the right data-format and method of text substitution is actually very hard to do elegantly and without creating a burden of maintenance and complexity on anybody who doesn’t want or need the translation capability (the single reason thus far for me not implementing it)

All in all, this is a very difficult feature to implement elegantly and cleanly. I would prefer to really fret the details with this and be able to implement something that is above and beyond what any other software project is doing because that's what NNF does—questions what everybody else is doing and rejects it for its sloppiness and complexity. NNF would not be "NoNonsense" if it used something like GetText :P

Your consideration is desired and appreciated.

Kroc added on

Just realised a few more things:

* Markup.txt would need to be moved to a templated page and translated, that's quite a lot of text!

* If NNF bundled additional translations, would that mean that the ReadMe would have to be translated? That's way beyond what to expect volunteers to translate and impossible to maintain if the ReadMe is changing every couple of releases

* What about 'about.html'? Would the site admin have to provide multiple versions if they supported multiple translations?

This is a lot of difficult stuff.

I think the first step is clearly defining where the actual translation support boundaries lie -- what is included, what is not. What is provided by NNF, and what is provided by the site admin.

Reply RSS

Replies

append delete #1. David Hund

It's a difficult issue for sure. But I guess translation is a requirement if you want to see NNF adopted more. I think most NNF installs simply have one 'global' language, so language-switching is less of an issue.

I'm afraid I don't have much (technical) advice on how to best approach this.

I guess a Language preference *could* be set with a cookie but I like the idea of implementing this in the URL scheme.

The few strings in the PHP files could be translated using gettext `_()` -like functionality but, obviously, the issue with the themes is that the beauty of separation of logic/templates makes us want to avoid using gettext-like stuff in the templates. But simply translating a theme as a whole will become a maintenance nightmare.

Hopefully some folks smarter than me have some ideas.

append delete #2. Kroc

Thanks for your reply! Yes, at the base of it, the site admin will set the default language for the whole site, but there is a need for individual users to choose their own language over that.

I've discovered something interesting that might help solve the language choice persistence problem elegantly. You can set and read cookies using the htaccess: http://www.askapache.com/htaccess/htaccess-fresh.html#Cookie_Manipulation_Tests_mod_rewrite

I could therefore read the language cookie, and set an environment variable which would be picked up by the PHP (much as HTTP_AUTH is handled at the moment). This would remove the need to use a querystring parameter. :)

GetText cannot be used because the template is static and it would break the separation. Translation could be done with a simple search / replace whilst the DOMTemplate is loading the template file. The art is in how we store translation data as to whether this will be bad for future maintenance and development.

append delete #3. Johann

the beauty of separation of logic/templates

maybe this'll help: right now you have hardcoded strings in the templates -- there is no seperation, other than in the realm of wishful thinking, but that's the www disease anyway -- , and replacing those with placeholders ought to make it more logical as well as more pretty.

and yes, I for one, while I would offer to translate into german, there's no way I would do that unless it's all in one file. one file for each language -- that's common practice for a lot of reasons I think.

something that works well is always more elegant than something that doesn't work quite so well. let form follow function, just a thought..

append delete #4. Kroc

replacing those with placeholders ought to make it more logical as well as more pretty.

Logical, yes, but we're avoiding logic in the templates :) A template that contains no actual real text is not very pretty, and is not very helpful for somebody trying to read, understand and modify the templates. As NNF has it now is almost perfect since the HTML can be developed independent of the PHP; if at all possible I'd like to keep that attribute.

But it is impossible to avoid repetition like that. Something has to key the English phrases to the foreign ones. In systems like gettext, the English phrase is used as the key so that the template reads more naturally. That is the goal of NNF too, to be readable.

append delete #5. Johann

yeah I kinda confused "logical" and "logic" there, sorry ^^

and I understand how having the real text would help with styling.. but if you used the gettext method, what if you change the english text later on, even just one character? because of that I don't quite see how you could avoid using an intermediate identifier, without possibly (likely?) running into headaches later on.

and it's not like you can't make those identifiers somewhat descriptive, e.g. "[type_password_here]" for the "A password to keep your name" string. sure, it would look less neat... but I doubt it would be a showstopper for anyone wanting to make a template... ?

I'll keep this in mind, and if I come up with something I'll naturally share :) but don't hold your breath for anything you haven't thought of already haha :D

append delete #6. Kroc

what if you change the english text later on, even just one character?

Yeah, that is a problem. When it comes to translation you have to choose a problem to accept and solve the other problems. I’m still debating what problems to solve and what ones to keep.

Given I am developing NNF in English and that any other languages are provided by volunteers only, and that anybody adding their own translation would likely have to know English too (because NNF's code and documentation will remain in English), I theorise that it is best to base translation behaviour around English-by-default, such as in the case of gettext.

Unless there is some kind of identifier mark, then the English text has to be repeated 1. inside each template as filler, and 2. in each translation file as the key to the translated content.

NNF will need an admin option to enable / specify which language translations are available to choose from, including adding their own. The reason for this is that an admin may modify the English heavily (custom skin or what-not) and not be able to update the other languages accordingly and so have to disable them.

Therefore in the instance where a site admin modifies the template English (the default filler), and this breaks translation because the key cannot be found, this won’t be a problem unless those translations are enabled; and the template will fall back to the Admin’s chosen English for those phrases without any code involved. Therefore, there are some benefits to this. That is, it’s nice for NNF to not have to translate English to English, if you get what I mean :)

append delete #7. Akerhielm

Wouldn't the simplest soloution be keeping language related things in themes and then implementing cookie-based theme switching as Kroc mentioned earlier?

append delete #8. Kroc

That's basically the approach I plan to take Akerhielm, the difficult part is choosing what way to do the search / replace it.

Here's a concept that strikes a happy medium:

%
[%IDENTIFIER:The quick brown fox jumps over the lazy dog%]
%

I'd prefer if we didn't have to go back to essentially having template tags, but that may be the only way to do translation without either repeating the English phrases fifty times throughout, or duplicating the whole theme fifty times over.

---

I've decided roughly where to draw the line with translation support:

* Everything intended for the site admin for installing and configuring NNF will be not be translated or support switchable translations (ReadMe, source code, )

* Everything intended for the site users should be translatable

That includes a wide variety of things, some of which are not in the right places in the code atm to support translation so there may be a lot of config file updating to do

* The templates (index / thread / append / delete)
* markup.txt will be moved to an HTML template like above
* Everything currently in 'theme.config.default.php' e.g. the <title> strings

Things I'm not sure about:

* Should the forum's name be translatable?
* The admin's per-forum 'about.html'?
* The Google search URL (should we also allow people to choose other search engines easily? Google is hardcoded atm)
* 'User X added on date Y' & 'This post was deleted by...'; note that these items are inserted permanently into the RSS, so it might not make sense to intermix languages like this if you have multiple users viewing the site in different languages

Kroc added on

Just had an idea about the 'This post was deleted by...' string.
When a post is deleted, the text inserted into the RSS will be the forum's default language. When a user is viewing the forum in a different language, their language will be displayed for this message because NNF knows that the post is deleted (from the `<category>`) and can show the translated text rather than the RSS's content.

The same can't be said of the 'User X added on date Y' string as it's intermixed with the other text. It will just have to be kept in the forum's default language.

append delete #9. Martijn

Should the forum’s name be translatable?

This might be something people with completely different alphabets would like. ‘Camen Design Forum’ might become ‘Camen Design φόρουμ’ for a Greek user. In most general cases this wouldn’t be necessary.

The admin's per-forum ‘about.html’?

Tricky. Personally I find forums that allow multiple languages within the same board very unorganised. By not allowing translations of about.html you might be able to get people to create separate subfora per language.

On the other hand, the about.html of the root forum should be translatable. Because this is a page any language user will visit at some point.

The Google search URL

Wont Google.com URLs automatically redirect to localised Google versions? Not sure…

append delete #10. David Hund

Just a quick thought. Couldn't the Search/Replace strings be based on simple convention. E.g. use HTML comments instead of 'templatetags'.

So, why not use: `<!-- L10n -->The quick brown fox jumps over the lazy dog<!-- /L10n -->` instead of `[%IDENTIFIER:The quick brown fox jumps over the lazy dog%]`

The principle is the same but the HTML remains readable.

append delete #11. Kroc

Re: forum name translation, good point but I think that for an initial implementation we'll have to leave this out. That's because all the strings that have to be translated are part of the theme and thus very self-contained. The forum

On the other hand, the about.html of the root forum should be translatable. Because this is a page any language user will visit at some point.

Very good point, that alone is reason enough; and since all about.htmls are templated by the same line of code, translation support will naturally be site-wide, even if it's not used for every sub-forum.

This does raise a new questions though! How should this actually be handled? I'm guessing that we can have 'about.html' for default and then 'about-de.html' &c. for the translations

Kroc added on

Sorry, I hadn’t noticed your reply David because it had appeared inbetween me writing mine. Yes, that's a more natural way of doing it (as far as HTML is concerned) even if it might be a bit messier.

append delete #12. jamesmiller5

It seems to me the issue of translation for web pages in general is better solved client side rather than server side or let the browser figure out how to best display the information rather than the website.

The Chrome browser auto-detects language and offers a translation for each site. There are options for Firefox as well.

append delete #13. Kroc

I don’t think that's workable. NNF is already being used internally within companies and browsers cannot translate such private URLs, either way machine translation isn't good enough.

This feature is more about the site administrator providing the default language and optionally additional translations that fit any custom skin (and therefore text) they’ve made.

append delete #14. Kroc

I think I’ve found a good, fairly elegant solution to translations that’s much better than even using HTML comments as delimiters; just use XPath!

I noticed that there are a number of HTML attributes that would need to be part of the translation, and that if I used a text database for the translations, it would actually have the ill effect of forcing all themes to adhere to certain element types and attributes. Instead, we can allow the theme-maker to specify however many translation strings they want, to apply to whatever elements/attributes they want, and these decisions not be hard-coded into NNF.

For example; imagine a translation table like this:

% PHP
$LANG = array (

//search box
 '#query@placeholder'		=> 'Google Search…'
,'#go@alt'			=> 'Search'

//header navigation
,'#nnf_add'			=> 'Add Thread'
,'#nnf_rss'			=> 'RSS'
,'//*[@id="index"]/li[1]/a'	=> 'Index'

⋮

);
%

It’s then up to the theme author to define what strings are translatable, not NNF, meaning that different themes can have wildly different text — and be translatable — without consistent text being mandated across the board by NNF (as is with most translation implementations).

append delete #15. Richard

XPath was actually a solution I was already thinking about… It seems like a very feasible approach, and because of the neat "simple' XPath-expressions, relatively easy to implement.

append delete #16. Gerben

html5 allows custom tags. Why not just use `<l10n>The quick brown fox jumps over the lazy dog</l10n>`. I think it looks way nicer that those comments. The only thing is, you probably want to remove them when you 'translate' to english.
Also it won't work on attribute values (maybe use `data-l10n-alt="test"` for translatable alt attributes, but it looks a bit ugly)

As for storing the selected language, I'd put in into the url. So e.g. `http://forum.camendesign.com/nl/nononsense_forum_translation_supporthow_` to select dutch. The same could be used for the RSS links. So `/nl/thread_title.rss` will use dutch for the dates, labels etc. (by the way, date and time formats also differ in different countries).

Using urls also make google able to index the different languages. Although you'd probably want to add a canonical link element in the head to prevent duplicated content, as the forum topics themselves remain the same language.

The same approach can be used for the about and markup. `/nl/about.html` and `/nl/markup.txt` will load the apropriate translation, if available. You could use some htaccess to check if about-nl.html is available, and rewrite the request to that file, or rewrite to about.html if it's not.

Gerben added on

I should have looked at the date on the comments. Sorry.
Anyway. Maybe someone else find it useful.

append delete #17. Kroc

I don't want to come across negative, but I do want to clarify where possible to all readers:

Why not just use `<l10n>The quick brown fox jumps over the lazy dog</l10n>`. I think it looks way nicer that those comments.

I am not using the comments any more, so thankfully that's no longer an issue. Instead XPath is used to select an element or attribute and then replace the text content. Since we are already selecting elements, there isn't a need to use an `<l10n>` element or otherwise.

As for storing the selected language, I'd put in into the url

This just wouldn't work in practice. If I share a link from one language, other users perhaps get a language they don't understand and that isn't helpful. Rather, language is a local, per-user choice and not a public, global choice. Language selection will be persisted using a cookie, and also where possible auto-detected from the browser. Therefore, whatever link they follow, they will see the interface in their chosen language.

I've got basic translation already working with pig-latin, and the chosen implementation is very simple and straight-forward. I still have much to do as there still needs to be a UI for choosing language, and the code to persist selection.

append delete #18. v4

As for language selection, why don't you just use what's already in HTTP? The Accept-Language header is the standard method, and AFAIK all (modern) browsers send it by default. It can also specify multiple languages in order of preference.
Actually, I'm not really pleased if sites try to select a language for me, or present me with a language selection where the only option is English anyway.

v4 added on

I'm not really pleased if sites try to select a language for me

That was, of course, referring to methods other than Accept-Language, such as IP-based location lookup. I.e. anything I can't control myself.

append delete #19. `123

sad

append delete #20. Wannes

such as IP-based location lookup

That's the reason why I mostly seem to be living in the french-speaking part of Belgium, despite living in Flanders. But it makes it easy to ignore the adds :)

append delete #21. Kroc

NNF won’t be doing IP look up, so don’t worry :) It will go by your browser language ('intl.accept-languages' in 'about:config'), if not your chosen language in NNF, if not the default language of the forum.

append delete #22. garl

I wonder: which forum needs _several_ interface languages?

A german forum contains posts in german. The interface should be german. Now, why would someone need, for example, an english interface for that forum? If he doesn't understand german, the english interface wouldn't be of much use, because the "real" content is in german.

If a forum would be multilingual (so having sections/subforums for each language), the interface should switch to the current sections' language. No need for german interface in the english section and vice-versa.
In most cases, it would be best to use a separate (translatable) NNF for each language section.

---

I really like the Xpath approach. But I think it would suffice to be able to translate the whole interface to a single language; no need for a way to _switch_ interface language for visitors.

Reply

(Leave this as-is, it’s a trap!)

There is no need to “register”, just enter the same name + password of your choice every time.

Pro tip: Use markup to add links, quotes and more.

Your friendly neighbourhood moderators: Kroc, Impressed, theraje, Martijn