<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Internationalisation Tips &#187; internationalisation</title>
	<atom:link href="http://internationalisationtips.com/tag/internationalisation/feed/" rel="self" type="application/rss+xml" />
	<link>http://internationalisationtips.com</link>
	<description>practical tips on building an international presence</description>
	<lastBuildDate>Mon, 29 Mar 2010 13:41:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The dynamic sentence creation anti-pattern</title>
		<link>http://internationalisationtips.com/2010/03/05/the-dynamic-sentence-creation-anti-pattern/</link>
		<comments>http://internationalisationtips.com/2010/03/05/the-dynamic-sentence-creation-anti-pattern/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 21:42:43 +0000</pubDate>
		<dc:creator>Isofarro</dc:creator>
				<category><![CDATA[Translations]]></category>
		<category><![CDATA[antipattern]]></category>
		<category><![CDATA[dynamic]]></category>
		<category><![CDATA[gotcha]]></category>
		<category><![CDATA[grammar]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[internationalisation]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://internationalisationtips.com/?p=18</guid>
		<description><![CDATA[The natural internationalisation stumbling block, particularly for technical people, is that localisation isn&#8217;t just about translating static text strings. Surprisingly, many developers and programmers fail to consider that sentences in one natural language cannot be simply translated one word at a time to another language.
Differences in grammar
Every human language has its own grammatical rules and [...]]]></description>
			<content:encoded><![CDATA[<p>The natural internationalisation stumbling block, particularly for technical people, is that localisation isn&#8217;t just about translating static text strings. Surprisingly, many developers and programmers fail to consider that sentences in one natural language cannot be simply translated one word at a time to another language.</p>
<h3>Differences in grammar</h3>
<p>Every human language has its own grammatical rules and style. The chances of a grammatical structure being the same across a range of natural languages is extremely low. So code written around a specific grammatical construct in one human language presents an impossible internationalisation barrier when needing to be translated into another language.</p>
<p>There&#8217;s only two approaches that could work here:</p>
<ol>
<li>Write a custom replacement function per language</li>
<li>Throw away the code and try again</li>
</ol>
<h3>Drawbacks of language dependent code</h3>
<p>Unfortunately option 1 &#8211; writing a custom replacement function per language &#8211; would require a developer to be involved every time a new language needs to be supported. And that developer needs to know this new language well enough to make the necessary changes or additions to get his function returning the grammatically correct output each time.</p>
<p>For every natural language that needs to be supported, the developer has to supply a new function. That function most likely already contains business logic. Adding a new language means duplicating, or reimplementing existing business logic to meet the grammatical structures of the new language.</p>
<p>So what happens when the business logic needs to be updated? Yep, the developer now has to update every single copy of the function. Each time checking the natural language syntax is correct. That means that the developer maintaining this piece of code has to be broadly familiar with every language his code supports. And that is totally unrealistic.</p>
<p>Developer zugzwang.</p>
<h3>A real-world case</h3>
<p>I ran into a piece of dynamic sentence generation code about two years ago when I was tasked with localising some &#8220;global-ready&#8221; JavaScript for use in Europe. I was assured all that needed to be done is to replace the static strings in the code with a reference to a JavaScript translations lookup.</p>
<p>Here is a simplified JavaScript pseudo-code version of the code I uncovered. (Simplified so we can focus on the internationalisation issues without getting bogged down in convoluted business logic.)</p>
<pre><code>
function getMarketStatus(market) {
	var message = market.name;
	var now     = new Date().getTime();

	if (market.open) {

		message += &quot; open&quot;

		if (market.open.early) {
			message += &quot; early&quot;

			if (market.reason) {
				message += &quot; for &quot; + market.reason;
			}

		} elseif (market.open.late) {
			message += &quot; late &quot;

			if (market.reason) {
				message += &quot; for &quot; + market.reason;
			}
		}

		if (now &lt; market.open.time) {
			message += &quot; in &quot; +
				formatTimeLeft(now, market.open.time);
		}

	} elseif (market.close) {

		message += &quot; close&quot;;

		if (market.close.early) {
			message += &quot; early&quot;

			if (market.reason) {
				message += &quot; for &quot; + market.reason;
			}

		} elseif (market.close.late) {
			message += &quot; late &quot;

			if (market.reason) {
				message += &quot; for &quot; + market.reason;
			}
		}

		if (now &lt; market.close.time) {
			message += &quot; in &quot; +
				formatTimeLeft(now, market.close.time);
		}

	} else {
		message += &quot; closed&quot;;
	}

 	return message;
}
</code></pre>
<p>This piece of code generates one sentence of text summarising the market status. Either the market is open or closed, opening soon or closing soon, maybe earlier or later than normal (perhaps for a specified reason).</p>
<h3>Identifying the possibilities</h3>
<p>The function returns one of the following patterns (variable data identified with curly braced place-holders):</p>
<ul>
<li><samp>{market} open</samp></li>
<li><samp>{market} open in {timePeriod}</samp></li>
<li><samp>{market} open early</samp></li>
<li><samp>{market} open early in {timePeriod}</samp></li>
<li><samp>{market} open early for {reason}</samp></li>
<li><samp>{market} open early for {reason} in {timePeriod}</samp></li>
<li><samp>{market} open late</samp></li>
<li><samp>{market} open late in {timePeriod}</samp></li>
<li><samp>{market} open late for {reason}</samp></li>
<li><samp>{market} open late for {reason} in {timePeriod}</samp></li>
<li><samp>{market} close</samp></li>
<li><samp>{market} close in {timePeriod}</samp></li>
<li><samp>{market} close early</samp></li>
<li><samp>{market} close early in {timePeriod}</samp></li>
<li><samp>{market} close early for {reason}</samp></li>
<li><samp>{market} close early for {reason} in {timePeriod}</samp></li>
<li><samp>{market} close late</samp></li>
<li><samp>{market} close late in {timePeriod}</samp></li>
<li><samp>{market} close late for {reason}</samp></li>
<li><samp>{market} close late for {reason} in {timePeriod}</samp></li>
<li><samp>{market} closed</samp></li>
</ul>
<p>Where:</p>
<ul>
<li><var>{market}</var> is the name of the market under scrutiny, e.g. <samp>UK markets</samp></li>
<li><var>{timePeriod}</var> is the number of minutes and hours before the market opens or closes</li>
<li><var>{reason}</var> is the stated reason why a marked opened or closed early or late.</li>
</ul>
<h3>The simple difficulty</h3>
<p>That&#8217;s 21 different text strings. The likelihood of the word order remaining the same across different languages is close zero.</p>
<p>The simple case is that the order of atomic elements works in English, but unlikely to consistently work in other languages. And for this piece of logic to be fit for internationalisation, this order cannot be assumed to work. A translator needs to be able to use the most appropriate and correct order of the targeted language and culture. </p>
<p>Moreover, the difficulties don&#8217;t end there. </p>
<h3>The disguised change of meaning</h3>
<p>Perhaps the most insidious feature of the above code is that adding in an extra word significantly changes the meaning of previous words. Take for example these two generated sentences:</p>
<ol>
<li>UK markets open</li>
<li>UK markets open in 20 minutes</li>
</ol>
<p>The first sentence is a declaration that the market is currently open. The second, however, does not; it states that the market <em>will</em> open after a defined period of time. So the sentence has changed from a present tense declarative, to a future tense expectation.</p>
<p>The English grammar barely holds together in this change of tense, and it&#8217;s unlikely that more regular and refined languages could pull off this form of grammatical gymnastics.</p>
<h3>Factor out the natural language</h3>
<p>So how do we fix this? Rather simply, by avoiding constructing sentences fragments at a time. Figure out which pieces of information are needed, and then look up the most appropriate translatable sentence that matches the information.</p>
<p>We refactor the code above in a two step process:</p>
<ol>
<li>Replace the sentence appending logic with something that keeps track of which bits of information needs to be conveyed, and pick the most appropriate sentence.</li>
<li>Add the dynamic data into the sentence by means of token or place-holder replacement</li>
</ol>
<p>Step 1 requires a rethink of the business logic implementation. We have to keep track of what pieces of information we need to display, and from that pick the most appropriate sentence. An obvious way of doing this is to keep a translations hash with all the possible combinations, and keying those in a calculatable way. </p>
<p>I&#8217;ve done this the same way as the original code builds up a sentence, except I&#8217;m building up a lookup key. And the lookup key then maps to a full sentence. This level of abstraction divorces the actual sentence grammar from the business logic rather neatly. (This approach is analogous to bitwise logic; something familiar to most C developers)</p>
<p>After that, it&#8217;s a simple case of retrieving the correct sentence, and replacing any data place-holders with the actual information.</p>
<p>After these refactoring steps the code looks like this:</p>
<pre><code>
function getMarketStatus(market) {
	var status;
	var now     = new Date().getTime();

	// Collect the pertinent pieces of information
	// So we can pick the right translation string
	var message = {
		market:   market.name,
		sentence: &quot;&quot;
	};

	if (market.open) {
		status = market.open;
		message.sentence = &quot;O&quot;;

	} elseif (market.close) {
		status = market.close;
		message.sentence = &quot;C&quot;;

	} else {
		message.sentence = &quot;X&quot;;

	}

	if (status) {

		// Make a note of early/late status
		if (status.early) {
			message.sentence += &quot;E&quot;;

		} elseif (status.late) {
			message.sentence += &quot;L&quot;;

		}

		// Make a note of any reason
		if (market.reason) {
			message.reason    = market.reason;
			message.sentence += &quot;R&quot;;

		}

		// Make a note of any time period
		if (status.time) {
			message.timePeriod =
				formatTimeLeft(now, market.open.time);
			message.sentence += &quot;T&quot;;

		}
	}

	// Pick the right sentence to display
	var sentence = TRANSLATIONS[message.sentence];

	// Replace dynamic data
	return YAHOO.lang.substitute(
		sentence, message
	);
}

// Mapping each combination into a sentence
TRANSLATIONS = {
	O:    &quot;{market} open&quot;,
	OT:   &quot;{market} open in {timePeriod}&quot;,
	OE:   &quot;{market} open early&quot;,
	OET:  &quot;{market} open early in {timePeriod}&quot;,
	OER:  &quot;{market} open early for {reason}&quot;,
	OERT: &quot;{market} open early for {reason} in {timePeriod}&quot;,
	OL:   &quot;{market} open late&quot;,
	OLT:  &quot;{market} open late in {timePeriod}&quot;,
	OLR:  &quot;{market} open late for {reason}&quot;,
	OLRT: &quot;{market} open late for {reason} in {timePeriod}&quot;,
	C:    &quot;{market} close&quot;,
	CT:   &quot;{market} close in {timePeriod}&quot;,
	CE:   &quot;{market} close early&quot;,
	CET:  &quot;{market} close early in {timePeriod}&quot;,
	CER:  &quot;{market} close early for {reason}&quot;,
	CERT: &quot;{market} close early for {reason} in {timePeriod}&quot;,
	CL:   &quot;{market} close late&quot;,
	CLT:  &quot;{market} close late in {timePeriod}&quot;,
	CLR:  &quot;{market} close late for {reason}&quot;,
	CLRT: &quot;{market} close late for {reason} in {timePeriod}&quot;,
	X:    &quot;{market} closed&quot;
};
</code></pre>
<p>Once we move the <var>TRANSLATIONS</var> object to a language-specific file we can then allow translators to translate the sentences to the targeted language.</p>
<p>This technique is flexible enough even to tackle the irregular grammar when the <var>market.close</var> object exists but with no useful data, thus <q>{market} close</q> could easily be corrected to <q>{market} will close soon</q>.</p>
<p>Also the flexibility will make handling the <var>{reason}</var> wildcard a little easier, if not perfectly.</p>
<p>So we avoid the need to create dynamic sentences on the fly, and instead focus on what pieces of information we need to share. And then offer the translator sufficient flexibility, through the use of replaceable tokens, to set the most appropriate translation for the targeted language.</p>
<h3>Back in the real world&hellip;</h3>
<p>My refactored solution didn&#8217;t go live. Instead, the feature it powered was descoped in Europe. That was the cost of using this particular anti-pattern.</p>
]]></content:encoded>
			<wfw:commentRss>http://internationalisationtips.com/2010/03/05/the-dynamic-sentence-creation-anti-pattern/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Translating dynamic text strings</title>
		<link>http://internationalisationtips.com/2009/03/24/translating-dynamic-text-strings/</link>
		<comments>http://internationalisationtips.com/2009/03/24/translating-dynamic-text-strings/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 13:24:15 +0000</pubDate>
		<dc:creator>Isofarro</dc:creator>
				<category><![CDATA[Translations]]></category>
		<category><![CDATA[internationalisation]]></category>
		<category><![CDATA[localisation]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[tokens]]></category>
		<category><![CDATA[variables]]></category>

		<guid isPermaLink="false">http://www.internationalisationtips.com/?p=17</guid>
		<description><![CDATA[Translating text dynamically generated at request time is more complicated than static non-changing text. Depending on the reason why the text needs to be flexible at request time means taking a different approach.
In the last post we looked at simple static text replacement where the text doesn&#8217;t change from request to request. But as soon [...]]]></description>
			<content:encoded><![CDATA[<p>Translating text dynamically generated at request time is more complicated than static non-changing text. Depending on the reason why the text needs to be flexible at request time means taking a different approach.</p>
<p>In the last post we looked at <a href="http://www.internationalisationtips.com/2009/03/24/replacing-static-text-strings-with-references/">simple static text replacement</a> where the text doesn&#8217;t change from request to request. But as soon as we use conditionals to build up a text string this simple lookup method is no longer sufficient.</p>
<h3>Simple data inclusion</h3>
<p>The simplest of dynamic text examples we have is a message telling us which page we are on.</p>
<pre><code>
&lt;?php
$pageNo = 6;
echo "You are currently on page ", $pageNo;
?&gt;
</code></pre>
<p>The first pitfall to avoid is to resist creating a translation string for the static-looking text <q>You are currently on page</q>. Yes, this is a static piece of text, but it requires additional context for the message to make sense. This is done in the code by tacking on the page number at the end of the string. The resulting string is <q>You are currently on page 16</q> which makes sense without further context.</p>
<p>This works in English, but in other languages it may not be appropriate, or grammatically correct for the page number to be at the end. We need to ensure that the placement of the page number is as flexible as possible.</p>
<p>The most straightforward solution is to use a token to represent the <code>$pageNo</code> variable in the text string. And at request time replace these tokens with their variable amounts. This is a three step process:</p>
<ul>
<li>Refactor the static looking text to use a text replacement</li>
<li>Process the refactored string at request time</li>
<li>Add the refactored string to the translation table</li>
</ul>
<p>The commonly accepted practice for tokens is to use a single word wrapped in curly brackets. I&#8217;d recommend this approach because translation strings are independent of any language limitations, it&#8217;s easy to do replacement in PHP. And if you need to use these translations in JavaScript then YUI&#8217;s <a href="http://developer.yahoo.com/yui/docs/YAHOO.lang.html#method_substitute"><code>YAHOO.lang.substitute()</code></a> handles curly bracket replacement right out of the box. (The alternative is <code>sprintf</code>&#8217;s <code>%s</code> syntax, but be careful if there are multiple tokens that could be in a different order to what you&#8217;d expect)</p>
<pre><code>
&lt;?php
// Step 1: Refactor string
$pageNo = 6;
$pageMsg = "You are currently on page {page}";
echo $pageMsg;
?&gt;
</code></pre>
<p>With this refactoring we are basically just echoing out a static string that&#8217;s sitting in a new variable. Between defining the variable and writing it out we need to add in some extra code to replace <code>{page}</code> with the actual page number:</p>
<pre><code>
&lt;?php
// Step 2: Replace tokens
$pageMsg = "You are currently on page {page}";
$pageMsg = preg_replace('/{page}/', $pageNo, $pageMsg);
echo $pageMsg;
?&gt;
</code></pre>
<p>The <code>preg_replace</code> looks for an occurrence of <samp>{page}</samp> and replaces it with the contents of the variable <code>$pageNo</code>.</p>
<p>Now we can treat the starting string as a static text string and add it to our translations array:</p>
<pre><code>
&lt;?php
$translations = array(
	"pageMessage" => "You are currently on page {page}";
);
?&gt;
</code></pre>
<p>And update our main code to reference this translation</p>
<pre><code>
&lt;?php
// Step 3: Add to the translations dictionary
$pageMsg = $translations['pageMessage'];
$pageMsg = preg_replace('/{page}/', $pageNo, $pageMsg);
echo $pageMsg;
?&gt;
</code></pre>
<p>The benefit of this approach is that the token can be anywhere in the string and now requires no code modifications if in one language the token needs to be somewhere other than at the end. We&#8217;ve removed the locale dependencies.</p>
<h3>Grammatical differences</h3>
<p>One of the difficult problems of creating text on the fly is grammatical differences between languages. We have to be careful where our code is implementing grammar rules. For example, here&#8217;s a PHP snippet that prints out the number of unread email items:</p>
<pre><code>
&lt;?php
$unread = 2;
echo $unread, " unread email", ($unread==1)?'':'s';
?&gt;
</code></pre>
<p>This just about works in English. But translating for a different locale we can&#8217;t assume the same grammar rules apply. Obviously pluralising a word isn&#8217;t as straightforward as adding the letter &#8217;s&#8217; to the end.</p>
<p>This particular piece of code needs to be refactored to remove the grammar logic, and then it is in a state to be localised. The obvious first step, along with our text replacement token, is to remove the conditional logic and create two separate text strings:</p>
<pre><code>
&lt;?php
$unread = 2;

// Select the right message
$unreadMsg = "{total} unread emails";
if ($unread==1) {
	$unreadMsg = "{total} unread email";
}

// Replace the token
$unreadMsg = preg_replace('/{total}/', $unread, $unreadMsg);

echo $unreadMsg;
?&gt;
</code></pre>
<p>And we are back into the normal lines of replacing static text with a reference to a translation. (Although Polish pluralisation is more complicated, so would need more logic or perhaps a different approach &#8211; hat tip <a href="http://intranation.com/">Brad</a>).</p>
<h3>String translation libraries</h3>
<p>It becomes a little tedious to create a regex rule for every token that needs to be replaced. And there are classes or libraries to do this for just about any web-facing programming language. A very elegant solution is <a href="http://ejeliot.com/">Ed Eliot&#8217;s</a> PHP 5 <a href="http://www.ejeliot.com/blog/114">text translation library</a>.</p>
<p>Ed&#8217;s class gets the basics right:</p>
<ul>
<li>Translations are in a separate file, one per language</li>
<li>Specifying the language code and it grabs the right translations file</li>
<li>the <code>Get</code> method takes an array of token replacements, and applies them to the translations for you</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://internationalisationtips.com/2009/03/24/translating-dynamic-text-strings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Replacing static text strings with references</title>
		<link>http://internationalisationtips.com/2009/03/24/replacing-static-text-strings-with-references/</link>
		<comments>http://internationalisationtips.com/2009/03/24/replacing-static-text-strings-with-references/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 11:55:56 +0000</pubDate>
		<dc:creator>Isofarro</dc:creator>
				<category><![CDATA[Translations]]></category>
		<category><![CDATA[internationalisation]]></category>
		<category><![CDATA[localisation]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://www.internationalisationtips.com/?p=9</guid>
		<description><![CDATA[The first major step to localising a website or web application is to translate all the static text strings into the preferred language. Static text being text that doesn&#8217;t change from request to request.
The typical approach to translating this text is to replace every piece of static text with a variable reference that points to [...]]]></description>
			<content:encoded><![CDATA[<p>The first major step to localising a website or web application is to translate all the static text strings into the preferred language. Static text being text that doesn&#8217;t change from request to request.</p>
<p>The typical approach to translating this text is to replace every piece of static text with a variable reference that points to a translation dictionary. For example the simple code:</p>
<pre><code>
echo "Hello world";
</code></pre>
<p>The static text <samp>&#8220;Hello World&#8221;</samp> should be replaced with a variable reference. In this example, let&#8217;s create a simple lookup array in PHP (in its own external file, and <code>include</code> it in):</p>
<pre><code>
&lt;?php
$translations = array(
    'hello_world' =&gt; 'Hello World'
    // Other translation strings
);
?&gt;
</code></pre>
<p>We can now replace our original text with the reference <code>$translations['hello_world']</code>, like so:</p>
<pre><code>
echo $translations['hello_world'];
</code></pre>
<p>With the <code>$translations</code> array being in an external PHP file we can have one of these files per locale or language and include the right one at request time.</p>
<p>This is the first step at separating the locale specific information from our code. Adding a new locale means creating a new translations file for that locale and more importantly, no code changes.</p>
<p>A large chunk of internationalising code for localisation involves just replacing existing text with translated versions. It is an important step, and the first one to get right. The next step will involve dealing with strings that are <a href="http://www.internationalisationtips.com/2009/03/24/translating-dynamic-text-strings/">dynamically altered at request time</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://internationalisationtips.com/2009/03/24/replacing-static-text-strings-with-references/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
