<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Internationalisation Tips</title>
	<atom:link href="http://internationalisationtips.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://internationalisationtips.com</link>
	<description>practical tips on building an international presence</description>
	<lastBuildDate>Wed, 24 Aug 2011 14:20:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Localising templates</title>
		<link>http://internationalisationtips.com/2011/08/24/localising-templates/</link>
		<comments>http://internationalisationtips.com/2011/08/24/localising-templates/#comments</comments>
		<pubDate>Wed, 24 Aug 2011 14:18:42 +0000</pubDate>
		<dc:creator>Isofarro</dc:creator>
				<category><![CDATA[Templates]]></category>
		<category><![CDATA[dimensions]]></category>
		<category><![CDATA[facets]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[localisation]]></category>
		<category><![CDATA[markup]]></category>
		<category><![CDATA[specialising]]></category>
		<category><![CDATA[templates]]></category>

		<guid isPermaLink="false">http://internationalisationtips.com/?p=77</guid>
		<description><![CDATA[Most of the time the HTML for a site doesn&#8217;t need to change as you localise an already internationalised site for a specific locale. Though your website needs the flexibility to specify locale-specific markup, so it is well worth having a system in place that allows that. And implemented properly, a mechanism for specialising templates [...]]]></description>
			<content:encoded><![CDATA[<p>Most of the time the HTML for a site doesn&#8217;t need to change as you localise an already internationalised site for a specific locale. Though your website needs the flexibility to specify locale-specific markup, so it is well worth having a system in place that allows that. And implemented properly, a mechanism for specialising templates based on multiple dimensions (locale being just one dimension) can be a very powerful tool.</p>
<h3>Componentising a site</h3>
<p>The typical structure for localising markup is to split a page into multiple templates and components. Typically these reusable templates are derived from the structure of your website, but essentially, any repeating patterns of markup should have it&#8217;s own template (be it a component or a partial).</p>
<p>The idea is that we have one place that defines how a particular design element is rendered.</p>
<h3>Inheritance structure</h3>
<p>An important feature of an effectively internationalised site is the ability to override specific templates or components for localisation-specific requirements. This is either to customise the markup of a particular component, or not displaying that component, or displaying something not available to any other locale.</p>
<p>In it&#8217;s most basic form what you want is a templating system that, when a template is required, instead of looking for it in one specific place, it looks in a series of locations (based on the requested locale) until the most appropriate one is found, and then uses that. The most appropriate template is in the range of one specialised just for that locale right up to the generic default that applies to all locales.</p>
<p>Essentially in a well internationalised site there&#8217;s a generic set of components that are applicable to most locales most of the time. Then there is a locale specific overrides that either customise the component, stop it from rendering, or insert something not offered in the generic set. In this way each locale can override any template at any point.</p>
<h3>Template inheritance gotchas</h3>
<p>One very common mistake in inheritance is to use a site&#8217;s primary locale as the default level of localisation. Yahoo US made this mistake time and time again. Mainly because they built the US version of each media property first (because the US is Yahoo&#8217;s primary market). When they tried to create a Canadian version of Yahoo Finance (in English only: ca.finance.yahoo.com), for example, instead of refactoring the code to have a generic localisation level and both the US version and the Canadian version would specialise from that, they decided to keep the US version at the top of this tree, and the Canadian version became a descendent of that.</p>
<p>This created a serious maintenance headache. Since the size of the Canadian audience was miniscule in comparison to the US audience, there was no dedicated development team for it. Of course, with a proper internationalisation framework in place this isn&#8217;t a problem. But because Canada inherited the US codebase, every change to the US codebase became immediately available to Canada. Including features that could contractually only be offered in the US. That means for every change to the US codebase, some poor sap needed to undo that change for Canada by finding the previous version of the changed template, and localising it down to Canada, thus undoing the US change. This do/undo process wasted a large number of developer hours.</p>
<p>They realised the minefield of this approach, and instead of refactoring the Finance property to have a generic, rather than a US as a default localisation, they copied the templates for Canada into a separate codebase, and had a dedicated team in Canada to support their own offshoot.</p>
<p>Luckily, Yahoo in Europe had a tiny fraction of resources the US had access to, and in that scarcity they adopted a much better approach by building their own bespoke templating system, and having a generic localisation level at the top. That allowed Yahoo Europe to support 5 countries, and a dozen independent media sites, with one small team of developers maintaining the templates. (Which worked well for close on a decade; until Yahoo US decided that global properties / one codebase with everything inheriting from the US locale was the preferred solution.)</p>
<h3>Specialisation on steroids: Yahoo Europe</h3>
<p>At Yahoo! Europe we had a very powerful in-house template editing system (developed in the 20th century) that allowed us to build a generic Yahoo media site. Then we had the ability to specialise<sup>[<a href="#footnote-1">1</a>]</sup> these templates to various dimensions:</p>
<ul>
<li>A site dimension: so a Sports site could customise a particular component just for Sport, without affecting the other media sites</li>
<li>A country dimension<sup>[<a href="#footnote-2">2</a>]</sup>: We could specialise to each of the five European countries we supported (independent or dependent on the site dimension)</li>
<li>A site section dimension</li>
<li>A content-type dimension</li>
<li>A content data-provider dimension</li>
<li>A page-specific dimension</li>
<li>A template specific dimension</li>
</ul>
<p>Although from the perspective of the page every template had a localisation path which looked like a list of paths to look for a specific template, starting from the most-specific (the template-specific level), and going up the list until it found an existing component.</p>
<p>The actual inheritance structure was a lot more complicated, and frankly ingenious. Proper localisation is a set of dimensions. And each dimension inter-relates independently of the others, which makes using a localisation tree impractical and limiting. Dimensions are the factors that can affect how a component displays. As I&#8217;ve mentioned, at Yahoo, the actual site itself is one major dimension (A Sports site, as opposed to a News site), then we have dimensions for the country, the data provider, the site section, the type of page, the type of data, and even the individual template itself. New dimensions could be added fairly simply: when we needed to co-brand a section of the site for a particular advertiser/partner we would introduce that as a new dimension and specialise any templates needing customisation to that level. Then when the advertising campaign was over we&#8217;d just remove the dimension from the specialism path.</p>
<h3>Finding the right level of specialism</h3>
<p>Picking the right specialism level is not entirely straightforward, it needs to be considered. It&#8217;s far too easy to specialise a template right to the bottom so that you ensure that there&#8217;s no impact elsewhere on the site. This approach isn&#8217;t ideal and in the long-term it proves to be more costly, despite being the simplest and safest way to implement one change.</p>
<p>It takes knowledge of the available dimensions, what they are used for and where to accurately identify the appropriate level of specialism for a change. You need confidence in making the change at a higher level won&#8217;t break parts of the site you are not immediately looking at. That means understanding the implications of your change, and being in a good position to identify and test the affected areas of the site.</p>
<p>Focusing just on the primary locale of the site is only viable when developers are very confident of the scope and impact of their changes. Developers do need to consider the localisation implications of their changes. It&#8217;s not enough to get it working in your preferred locale and just assume everywhere else will be just fine. You need to know the impact, either before you make the change, or by confirming it as so by testing it thoroughly.</p>
<p>One of the weaknesses of templating systems of a dynamic and flexible nature is quickly identifying which pages are affected by changing a template. Sometimes a grep isn&#8217;t going to be enough, if the template inclusion is something other than a static reference. A developer needs good tools that know about the templating framework and help identify affected pages.</p>
<h3>Open-source templating frameworks</h3>
<p>Although my experience in internationalisation-supporting frameworks is based on an internal Yahoo templating framework called Jake, it isn&#8217;t entirely Yahoo specific. Jake is written in Perl. When Yahoo adopted PHP as it&#8217;s framework of choice a team quickly got involved in creating a very flexible dimension-supporting <a href="http://developer.yahoo.com/r3/">templating system called r3</a>. This got released as open source a few years ago. It&#8217;s very powerful, but it really needs <a href="http://marknormanfrancis.com/">someone who understands r3</a> to write us a <a href="https://github.com/norm/content-usingr3">guide in how to use r3</a> and wield it properly.</p>
<p>In the meantime, start with a templating system that allows you to define a generic default localisation for your website, and a specialisation level for each locale where every template can be specialised to for locale-specific customisation. This is the bare minimum for localisation of website templates. Though, if you build it right and allow independent dimensions, you have a very powerful and flexible templating system that will do amazing things.</p>
<h3>Footnotes</h3>
<p id="footnote-1">[1] At Yahoo we referred to the specialism path as the localisation path, although this path wasn&#8217;t strictly about localisation but also covered many mutually-exclusive dimensions. Within the European webdev team we understood when we were talking about localisation of templates, and when we were talking about localisation in an internationalisation context. But it was evident that using localisation for templates in this instance can be confusing, so instead calling this specific feature &#8220;specialism&#8221; and &#8220;specialising templates&#8221; would make the difference clearer. I&#8217;m adopting this nomenclature here.</p>
<p id="footnote-2">[2] The choice of country as the primary specialism level of localisation is a classic mistake of internationalisation: a locale or culture doesn&#8217;t necessarily map to a country. Yahoo&#8217;s Spanish News site contains stories in multiple languages, including Spanish, Catalan, Basque etc. But there&#8217;s only one Spanish news site. Instead of a country locale, Yahoo should have established a locale level &#8211; Spain-Spanish, Spain-Catalan, Spain-Basque, so three separate locales, not one country with articles in three different languages. The specialism for language is definitely needed, plus another extra level in case there&#8217;s a geographic/natural/cultural boundary between groups of people in the same country and language, but still are independent of each other. Yahoo&#8217;s insistence on countries defining locales is what limits their ability to accommodate citizens who do not use what Yahoo defines as that country&#8217;s acceptable language.</p>
<h3>Related resources:</h3>
<ul>
<li><a href="http://developer.yahoo.com/r3/">Easy localisation and templating with Yahoo r3</a></li>
<li><a href="http://alfonsojimenez.com/tag/yahoo-r3/">Integrating Yahoo r3 with Phing</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://internationalisationtips.com/2011/08/24/localising-templates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Internationalisation Gotchas</title>
		<link>http://internationalisationtips.com/2010/03/29/internationalisation-gotchas/</link>
		<comments>http://internationalisationtips.com/2010/03/29/internationalisation-gotchas/#comments</comments>
		<pubDate>Mon, 29 Mar 2010 13:35:28 +0000</pubDate>
		<dc:creator>Isofarro</dc:creator>
				<category><![CDATA[Translations]]></category>
		<category><![CDATA[abbreviations]]></category>
		<category><![CDATA[calculations]]></category>
		<category><![CDATA[currencies]]></category>
		<category><![CDATA[date]]></category>
		<category><![CDATA[DOM]]></category>
		<category><![CDATA[formatting]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[numbers]]></category>
		<category><![CDATA[regulation]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[time]]></category>

		<guid isPermaLink="false">http://internationalisationtips.com/?p=70</guid>
		<description><![CDATA[The amount of work required to internationalise a website is woefully underestimated, and sometimes, the codebase is compromised to the extent that many features or capabilities are impossible to deliver with the existing code. In many cases, complete ground-up rewrites are needed to separate cleanly the business logic from the localisation requirements of countries. Mostly [...]]]></description>
			<content:encoded><![CDATA[<p>The amount of work required to internationalise a website is woefully underestimated, and sometimes, the codebase is compromised to the extent that many features or capabilities are impossible to deliver with the existing code. In many cases, complete ground-up rewrites are needed to separate cleanly the business logic from the localisation requirements of countries.</p>
<p>Mostly this requires developers and engineers of platforms to be aware of the internationalisation implications of their implementation choices. What may seem like an engineering best practice could very well be a massive barrier for localising sites to various countries.</p>
<h3 id="two_major_internationalisation_steps">Two major internationalisation steps</h3>
<p>The common explanation of the work required is covered by two basic proclamations:</p>
<ul>
<li>Everything must be in UTF-8</li>
<li>Static text strings must be replaced by translation references</li>
</ul>
<p>Yes, UTF-8 is essential to working in many countries, especially in Asian markets and non-Western European countries. Even today, new systems freshly built are failing on this essential step.</p>
<p>Being able to specify translations for text strings is essential for any site that needs more than one language. But does this even cover 80% of the work required to internationalise a site? </p>
<h3 id="head_first_internationalisation">Head-first internationalisation</h3>
<p>In my 9 month stint making a &#8220;global-ready&#8221; Finance codebase barely functional in Europe I encountered a number of internationalisation issues that should have already been dealt with, but hadn&#8217;t.</p>
<p>From the outset, it seemed fairly straight-forward. Take the new quotes system code, which runs both the US Finance site and the Canadian Finance site, and launch it in Europe. So the UTF-8 work is already done, and I just needed to extract the text strings and replace them with translation lookups.</p>
<p>It took 3 months to get one page live in one non-English country. We lost a number of features on the way, and the end result is sub-par in a number of respects, including regulatory requirements. Not a resounding success, and a tough learning curve of how internationalisation can hurt when it isn&#8217;t considered properly in building global-ready platforms.</p>
<h3 id="complex_translations">Complex translations</h3>
<p>Translating static text sentences and phrases is bog standard. However, sometimes we need this static text to contain variable information (for example the text string <q>You are on page 1 of 5</q>). It would be madness to force a translation of every single combination of page number and total number of pages, so we use <a href="http://internationalisationtips.com/2009/03/24/translating-dynamic-text-strings/">token replacement</a> to solve this tiny problem.</p>
<p>Be careful when using token replacement using more than one token. Don&#8217;t use the order of the tokens as a means of matching from one to the other. For instance, a simple example of sprintf: <code>sprintf("Welcome, %s of %s", $title, $land);</code> what happens when in one cultural locale the <var>$land</var> needs to appear before <var>$title</var>? Rather use name replacements. If your chosen programming language doesn&#8217;t allow named token replacements, then it would be wise to change your programming language to one that can.</p>
<p>Sometimes developers and engineers get too clever and create business logic that creates a sentence by adding one word at a time depending on a plethora of business logic. This becomes unlocalisable as grammar constructs vary across languages. I call this the <a href="http://internationalisationtips.com/2010/03/05/the-dynamic-sentence-creation-anti-pattern/">dynamic sentence creation anti-pattern</a>, and show a safer method of accomplishing this.</p>
<h3 id="other_forms_of_token_replacement">Other forms of token replacement</h3>
<p>Formatting currencies is an interesting variant of token replacement. The location of the currency symbol is either before or after the currency amount. But some currencies have tokens in the middle, too.</p>
<p>Plural forms also catch people out. In some languages zero is singular, in others it is a plural. Even making words plural (plural form) isn&#8217;t a case of checking whether it&#8217;s just one or more than one. Polish for example has at least five different orders until it settles down to a regular pattern. Some frameworks get this right, <a href="http://www.symfony-project.org/book/1_2/13-I18n-and-L10n#chapter_13_sub_handling_complex_translation_needs">symfony</a> for example.</p>
<p>Then comes <a href="http://en.wikipedia.org/wiki/Ordinal_indicator">ordinality indicators</a> (1st, 2nd, 3rd &hellip;); the English pattern is fairly regular, but in Czech it requires a bit of thought to algorithmically calculate the right indicator.</p>
<h3 id="formatting_numbers">Formatting numbers</h3>
<p>This is one of those obvious matters. Numbers should be formatted using the preferred cultural approaches, using the appropriate separators. Piece of cake. Run all numbers through a locale-aware formatter, and you&#8217;re done.</p>
<table>
<caption>Simple number formatting for countries</caption>
<tbody>
<tr>
<th scope="row">United States</th>
<td>1,200.50</td>
</tr>
<tr>
<th scope="row">United Kingdom</th>
<td>1,200.50</td>
</tr>
<tr>
<th scope="row">Germany</th>
<td>1.200,50</td>
</tr>
<tr>
<th scope="row">France</th>
<td>1 200,50</td>
</tr>
</tbody>
</table>
<p>A translation string for the thousands separator, and one for the decimal point character. Except, if you are using a translation system that strips leading spaces out of text (because of sloppy XML imports), then you run into a nasty problem that the French thousands separator becomes an empty string. You need to be able to trust your translation system.</p>
<p>In Financial reporting there&#8217;s lots of data and lots of long numbers, so to squeeze as much information as possible onto the page we need to shortern long numbers, like Apple&#8217;s Market Cap of $209,380,000,000,000 is shorterned to <samp>$209.38B</samp> in the US. Which just about fits into the space available.</p>
<p>But, for France, this is a little more tricky for two reasons:</p>
<ul>
<li>There isn&#8217;t a short form of Billions in France. The closest is &#8216;Md&#8217; which means thousand million.</li>
<li>The French Finance industry prefer a space between the number and it&#8217;s suffix.</li>
</ul>
<p>So to display Apple&#8217;s Market Cap in France we would print it as <samp>$209,38 Md</samp>. Again, if you&#8217;re translation system likes trimming off leading spaces, the French translation string for the shortened form of Billions is then incorrect.</p>
<h3 id="calculations">Calculations</h3>
<p>Numbers are numbers. Until they are formatted for your chosen locale. Then they are just strings. Adding up strings does one of three things:</p>
<ul>
<li>Addition is overloaded and the strings are concatenated together. So 1 + 1 equals 11.</li>
<li>The strings, not being clean numbers are evaluated to the first character, and the rest of it is dropped. so 3,500 + 4,500 equals 7.</li>
<li>The strings, not being clean numbers are evaluated to zero, so 3,500 + 4,500 equals 0.</li>
</ul>
<p>None of these do what you expect. Why would anyone want to do this? One feature on a finance portfolio is to track the value of your own share portfolio. So as you add a new share to your portfolio and add in the number of shares you own; that is multiplied by the current share price and added to the other share evaluations to give you a portfolio value.</p>
<p>Then periodically, when the share prices tick up or down, your portfolio valuation changes accordingly. Unfortunately when you use the DOM as the source of your data, and those numbers are locale-formatted numbers, just doing arithmetic on it results in incorrect values, because the underlying assumptions are broken.</p>
<p>As soon as you use the DOM as a data source, you are at the mercy of localisation formatting of the data. Either you need to unlocalise the data and get back the raw numbers, or you need to store the raw data somewhere else. The third possibility is to drop the feature.</p>
<h3 id="regulations">Regulations</h3>
<p>If your site is focused on a particular industry, you need to be well aware of the regulations of that industry in the countries you chose to support. If a country specifies that all financial information is displayed to at least four decimal places, then it isn&#8217;t a great idea to base your entire system on the assumption that producting data to two decimal places will be sufficient.</p>
<p>Be aware of the implications of regulation in various countries, especially regulation aimed at how information is presented. This is still an important part of an internationalisable platform or site.</p>
<h3 id="assumptions_of_timezones">Assumptions of timezones</h3>
<p>Time calculations are impossible if you do not know what timezone your timestamps are in. They are still fraught with difficulty when the timezones are known, and are different.</p>
<p>Mostly if you know the timezones, then you can easily offset the hours. Except when the times coincide with <a href="http://en.wikipedia.org/wiki/Daylight_saving_time">Daylight Savings Time</a> adjustments. Not all countries move their clocks forward or back on the same day. That&#8217;s why we have <a href="http://en.wikipedia.org/wiki/GNU_C_Library">dedicated libraries on the server</a> to deal with these calculations, and why anything time and date related should be calculated there rather than left to the developer to do in the browser.</p>
<h3 id="conclusion">Conclusion</h3>
<p>The last thing you want to do for an internationalised site is to turn off features that cannot be localised, or knowingly produce something that lands in a legal quagmire of regulation. Unfortunately many of the above problems are directly related to taking a code base built for one country and adapting it as is to an international market. Certain aspects became uninternationalisable because of compromises on the server-side, and in the architecture.</p>
<p>Internationalisation of a code base involves cleanly separating the default locale assumptions from the business logic. This step is paramount in enabling the localisation of the platform to other countries. Skip this step at your peril, because the next step of localising it will be a painful and frustrating experience.</p>
]]></content:encoded>
			<wfw:commentRss>http://internationalisationtips.com/2010/03/29/internationalisation-gotchas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The dynamic sentence creation anti-pattern</title>
		<link>http://internationalisationtips.com/2010/03/05/the-dynamic-sentence-creation-anti-pattern/</link>
		<comments>http://internationalisationtips.com/2010/03/05/the-dynamic-sentence-creation-anti-pattern/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 21:42:43 +0000</pubDate>
		<dc:creator>Isofarro</dc:creator>
				<category><![CDATA[Translations]]></category>
		<category><![CDATA[antipattern]]></category>
		<category><![CDATA[dynamic]]></category>
		<category><![CDATA[gotcha]]></category>
		<category><![CDATA[grammar]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[internationalisation]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://internationalisationtips.com/?p=18</guid>
		<description><![CDATA[The natural internationalisation stumbling block, particularly for technical people, is that localisation isn&#8217;t just about translating static text strings. Surprisingly, many developers and programmers fail to consider that sentences in one natural language cannot be simply translated one word at a time to another language. Differences in grammar Every human language has its own grammatical [...]]]></description>
			<content:encoded><![CDATA[<p>The natural internationalisation stumbling block, particularly for technical people, is that localisation isn&#8217;t just about translating static text strings. Surprisingly, many developers and programmers fail to consider that sentences in one natural language cannot be simply translated one word at a time to another language.</p>
<h3>Differences in grammar</h3>
<p>Every human language has its own grammatical rules and style. The chances of a grammatical structure being the same across a range of natural languages is extremely low. So code written around a specific grammatical construct in one human language presents an impossible internationalisation barrier when needing to be translated into another language.</p>
<p>There&#8217;s only two approaches that could work here:</p>
<ol>
<li>Write a custom replacement function per language</li>
<li>Throw away the code and try again</li>
</ol>
<h3>Drawbacks of language dependent code</h3>
<p>Unfortunately option 1 &#8211; writing a custom replacement function per language &#8211; would require a developer to be involved every time a new language needs to be supported. And that developer needs to know this new language well enough to make the necessary changes or additions to get his function returning the grammatically correct output each time.</p>
<p>For every natural language that needs to be supported, the developer has to supply a new function. That function most likely already contains business logic. Adding a new language means duplicating, or reimplementing existing business logic to meet the grammatical structures of the new language.</p>
<p>So what happens when the business logic needs to be updated? Yep, the developer now has to update every single copy of the function. Each time checking the natural language syntax is correct. That means that the developer maintaining this piece of code has to be broadly familiar with every language his code supports. And that is totally unrealistic.</p>
<p>Developer zugzwang.</p>
<h3>A real-world case</h3>
<p>I ran into a piece of dynamic sentence generation code about two years ago when I was tasked with localising some &#8220;global-ready&#8221; JavaScript for use in Europe. I was assured all that needed to be done is to replace the static strings in the code with a reference to a JavaScript translations lookup.</p>
<p>Here is a simplified JavaScript pseudo-code version of the code I uncovered. (Simplified so we can focus on the internationalisation issues without getting bogged down in convoluted business logic.)</p>
<pre><code>
function getMarketStatus(market) {
	var message = market.name;
	var now     = new Date().getTime();

	if (market.open) {

		message += &quot; open&quot;

		if (market.open.early) {
			message += &quot; early&quot;

			if (market.reason) {
				message += &quot; for &quot; + market.reason;
			}

		} elseif (market.open.late) {
			message += &quot; late &quot;

			if (market.reason) {
				message += &quot; for &quot; + market.reason;
			}
		}

		if (now &lt; market.open.time) {
			message += &quot; in &quot; +
				formatTimeLeft(now, market.open.time);
		}

	} elseif (market.close) {

		message += &quot; close&quot;;

		if (market.close.early) {
			message += &quot; early&quot;

			if (market.reason) {
				message += &quot; for &quot; + market.reason;
			}

		} elseif (market.close.late) {
			message += &quot; late &quot;

			if (market.reason) {
				message += &quot; for &quot; + market.reason;
			}
		}

		if (now &lt; market.close.time) {
			message += &quot; in &quot; +
				formatTimeLeft(now, market.close.time);
		}

	} else {
		message += &quot; closed&quot;;
	}

 	return message;
}
</code></pre>
<p>This piece of code generates one sentence of text summarising the market status. Either the market is open or closed, opening soon or closing soon, maybe earlier or later than normal (perhaps for a specified reason).</p>
<h3>Identifying the possibilities</h3>
<p>The function returns one of the following patterns (variable data identified with curly braced place-holders):</p>
<ul>
<li><samp>{market} open</samp></li>
<li><samp>{market} open in {timePeriod}</samp></li>
<li><samp>{market} open early</samp></li>
<li><samp>{market} open early in {timePeriod}</samp></li>
<li><samp>{market} open early for {reason}</samp></li>
<li><samp>{market} open early for {reason} in {timePeriod}</samp></li>
<li><samp>{market} open late</samp></li>
<li><samp>{market} open late in {timePeriod}</samp></li>
<li><samp>{market} open late for {reason}</samp></li>
<li><samp>{market} open late for {reason} in {timePeriod}</samp></li>
<li><samp>{market} close</samp></li>
<li><samp>{market} close in {timePeriod}</samp></li>
<li><samp>{market} close early</samp></li>
<li><samp>{market} close early in {timePeriod}</samp></li>
<li><samp>{market} close early for {reason}</samp></li>
<li><samp>{market} close early for {reason} in {timePeriod}</samp></li>
<li><samp>{market} close late</samp></li>
<li><samp>{market} close late in {timePeriod}</samp></li>
<li><samp>{market} close late for {reason}</samp></li>
<li><samp>{market} close late for {reason} in {timePeriod}</samp></li>
<li><samp>{market} closed</samp></li>
</ul>
<p>Where:</p>
<ul>
<li><var>{market}</var> is the name of the market under scrutiny, e.g. <samp>UK markets</samp></li>
<li><var>{timePeriod}</var> is the number of minutes and hours before the market opens or closes</li>
<li><var>{reason}</var> is the stated reason why a marked opened or closed early or late.</li>
</ul>
<h3>The simple difficulty</h3>
<p>That&#8217;s 21 different text strings. The likelihood of the word order remaining the same across different languages is close zero.</p>
<p>The simple case is that the order of atomic elements works in English, but unlikely to consistently work in other languages. And for this piece of logic to be fit for internationalisation, this order cannot be assumed to work. A translator needs to be able to use the most appropriate and correct order of the targeted language and culture. </p>
<p>Moreover, the difficulties don&#8217;t end there. </p>
<h3>The disguised change of meaning</h3>
<p>Perhaps the most insidious feature of the above code is that adding in an extra word significantly changes the meaning of previous words. Take for example these two generated sentences:</p>
<ol>
<li>UK markets open</li>
<li>UK markets open in 20 minutes</li>
</ol>
<p>The first sentence is a declaration that the market is currently open. The second, however, does not; it states that the market <em>will</em> open after a defined period of time. So the sentence has changed from a present tense declarative, to a future tense expectation.</p>
<p>The English grammar barely holds together in this change of tense, and it&#8217;s unlikely that more regular and refined languages could pull off this form of grammatical gymnastics.</p>
<h3>Factor out the natural language</h3>
<p>So how do we fix this? Rather simply, by avoiding constructing sentences fragments at a time. Figure out which pieces of information are needed, and then look up the most appropriate translatable sentence that matches the information.</p>
<p>We refactor the code above in a two step process:</p>
<ol>
<li>Replace the sentence appending logic with something that keeps track of which bits of information needs to be conveyed, and pick the most appropriate sentence.</li>
<li>Add the dynamic data into the sentence by means of token or place-holder replacement</li>
</ol>
<p>Step 1 requires a rethink of the business logic implementation. We have to keep track of what pieces of information we need to display, and from that pick the most appropriate sentence. An obvious way of doing this is to keep a translations hash with all the possible combinations, and keying those in a calculatable way. </p>
<p>I&#8217;ve done this the same way as the original code builds up a sentence, except I&#8217;m building up a lookup key. And the lookup key then maps to a full sentence. This level of abstraction divorces the actual sentence grammar from the business logic rather neatly. (This approach is analogous to bitwise logic; something familiar to most C developers)</p>
<p>After that, it&#8217;s a simple case of retrieving the correct sentence, and replacing any data place-holders with the actual information.</p>
<p>After these refactoring steps the code looks like this:</p>
<pre><code>
function getMarketStatus(market) {
	var status;
	var now     = new Date().getTime();

	// Collect the pertinent pieces of information
	// So we can pick the right translation string
	var message = {
		market:   market.name,
		sentence: &quot;&quot;
	};

	if (market.open) {
		status = market.open;
		message.sentence = &quot;O&quot;;

	} elseif (market.close) {
		status = market.close;
		message.sentence = &quot;C&quot;;

	} else {
		message.sentence = &quot;X&quot;;

	}

	if (status) {

		// Make a note of early/late status
		if (status.early) {
			message.sentence += &quot;E&quot;;

		} elseif (status.late) {
			message.sentence += &quot;L&quot;;

		}

		// Make a note of any reason
		if (market.reason) {
			message.reason    = market.reason;
			message.sentence += &quot;R&quot;;

		}

		// Make a note of any time period
		if (status.time) {
			message.timePeriod =
				formatTimeLeft(now, market.open.time);
			message.sentence += &quot;T&quot;;

		}
	}

	// Pick the right sentence to display
	var sentence = TRANSLATIONS[message.sentence];

	// Replace dynamic data
	return YAHOO.lang.substitute(
		sentence, message
	);
}

// Mapping each combination into a sentence
TRANSLATIONS = {
	O:    &quot;{market} open&quot;,
	OT:   &quot;{market} open in {timePeriod}&quot;,
	OE:   &quot;{market} open early&quot;,
	OET:  &quot;{market} open early in {timePeriod}&quot;,
	OER:  &quot;{market} open early for {reason}&quot;,
	OERT: &quot;{market} open early for {reason} in {timePeriod}&quot;,
	OL:   &quot;{market} open late&quot;,
	OLT:  &quot;{market} open late in {timePeriod}&quot;,
	OLR:  &quot;{market} open late for {reason}&quot;,
	OLRT: &quot;{market} open late for {reason} in {timePeriod}&quot;,
	C:    &quot;{market} close&quot;,
	CT:   &quot;{market} close in {timePeriod}&quot;,
	CE:   &quot;{market} close early&quot;,
	CET:  &quot;{market} close early in {timePeriod}&quot;,
	CER:  &quot;{market} close early for {reason}&quot;,
	CERT: &quot;{market} close early for {reason} in {timePeriod}&quot;,
	CL:   &quot;{market} close late&quot;,
	CLT:  &quot;{market} close late in {timePeriod}&quot;,
	CLR:  &quot;{market} close late for {reason}&quot;,
	CLRT: &quot;{market} close late for {reason} in {timePeriod}&quot;,
	X:    &quot;{market} closed&quot;
};
</code></pre>
<p>Once we move the <var>TRANSLATIONS</var> object to a language-specific file we can then allow translators to translate the sentences to the targeted language.</p>
<p>This technique is flexible enough even to tackle the irregular grammar when the <var>market.close</var> object exists but with no useful data, thus <q>{market} close</q> could easily be corrected to <q>{market} will close soon</q>.</p>
<p>Also the flexibility will make handling the <var>{reason}</var> wildcard a little easier, if not perfectly.</p>
<p>So we avoid the need to create dynamic sentences on the fly, and instead focus on what pieces of information we need to share. And then offer the translator sufficient flexibility, through the use of replaceable tokens, to set the most appropriate translation for the targeted language.</p>
<h3>Back in the real world&hellip;</h3>
<p>My refactored solution didn&#8217;t go live. Instead, the feature it powered was descoped in Europe. That was the cost of using this particular anti-pattern.</p>
]]></content:encoded>
			<wfw:commentRss>http://internationalisationtips.com/2010/03/05/the-dynamic-sentence-creation-anti-pattern/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Translating dynamic text strings</title>
		<link>http://internationalisationtips.com/2009/03/24/translating-dynamic-text-strings/</link>
		<comments>http://internationalisationtips.com/2009/03/24/translating-dynamic-text-strings/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 13:24:15 +0000</pubDate>
		<dc:creator>Isofarro</dc:creator>
				<category><![CDATA[Translations]]></category>
		<category><![CDATA[internationalisation]]></category>
		<category><![CDATA[localisation]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[tokens]]></category>
		<category><![CDATA[variables]]></category>

		<guid isPermaLink="false">http://www.internationalisationtips.com/?p=17</guid>
		<description><![CDATA[Translating text dynamically generated at request time is more complicated than static non-changing text. Depending on the reason why the text needs to be flexible at request time means taking a different approach. In the last post we looked at simple static text replacement where the text doesn&#8217;t change from request to request. But as [...]]]></description>
			<content:encoded><![CDATA[<p>Translating text dynamically generated at request time is more complicated than static non-changing text. Depending on the reason why the text needs to be flexible at request time means taking a different approach.</p>
<p>In the last post we looked at <a href="http://www.internationalisationtips.com/2009/03/24/replacing-static-text-strings-with-references/">simple static text replacement</a> where the text doesn&#8217;t change from request to request. But as soon as we use conditionals to build up a text string this simple lookup method is no longer sufficient.</p>
<h3>Simple data inclusion</h3>
<p>The simplest of dynamic text examples we have is a message telling us which page we are on.</p>
<pre><code>
&lt;?php
$pageNo = 6;
echo "You are currently on page ", $pageNo;
?&gt;
</code></pre>
<p>The first pitfall to avoid is to resist creating a translation string for the static-looking text <q>You are currently on page</q>. Yes, this is a static piece of text, but it requires additional context for the message to make sense. This is done in the code by tacking on the page number at the end of the string. The resulting string is <q>You are currently on page 16</q> which makes sense without further context.</p>
<p>This works in English, but in other languages it may not be appropriate, or grammatically correct for the page number to be at the end. We need to ensure that the placement of the page number is as flexible as possible.</p>
<p>The most straightforward solution is to use a token to represent the <code>$pageNo</code> variable in the text string. And at request time replace these tokens with their variable amounts. This is a three step process:</p>
<ul>
<li>Refactor the static looking text to use a text replacement</li>
<li>Process the refactored string at request time</li>
<li>Add the refactored string to the translation table</li>
</ul>
<p>The commonly accepted practice for tokens is to use a single word wrapped in curly brackets. I&#8217;d recommend this approach because translation strings are independent of any language limitations, it&#8217;s easy to do replacement in PHP. And if you need to use these translations in JavaScript then YUI&#8217;s <a href="http://developer.yahoo.com/yui/docs/YAHOO.lang.html#method_substitute"><code>YAHOO.lang.substitute()</code></a> handles curly bracket replacement right out of the box. (The alternative is <code>sprintf</code>&#8216;s <code>%s</code> syntax, but be careful if there are multiple tokens that could be in a different order to what you&#8217;d expect)</p>
<pre><code>
&lt;?php
// Step 1: Refactor string
$pageNo = 6;
$pageMsg = "You are currently on page {page}";
echo $pageMsg;
?&gt;
</code></pre>
<p>With this refactoring we are basically just echoing out a static string that&#8217;s sitting in a new variable. Between defining the variable and writing it out we need to add in some extra code to replace <code>{page}</code> with the actual page number:</p>
<pre><code>
&lt;?php
// Step 2: Replace tokens
$pageMsg = "You are currently on page {page}";
$pageMsg = preg_replace('/{page}/', $pageNo, $pageMsg);
echo $pageMsg;
?&gt;
</code></pre>
<p>The <code>preg_replace</code> looks for an occurrence of <samp>{page}</samp> and replaces it with the contents of the variable <code>$pageNo</code>.</p>
<p>Now we can treat the starting string as a static text string and add it to our translations array:</p>
<pre><code>
&lt;?php
$translations = array(
	"pageMessage" => "You are currently on page {page}";
);
?&gt;
</code></pre>
<p>And update our main code to reference this translation</p>
<pre><code>
&lt;?php
// Step 3: Add to the translations dictionary
$pageMsg = $translations['pageMessage'];
$pageMsg = preg_replace('/{page}/', $pageNo, $pageMsg);
echo $pageMsg;
?&gt;
</code></pre>
<p>The benefit of this approach is that the token can be anywhere in the string and now requires no code modifications if in one language the token needs to be somewhere other than at the end. We&#8217;ve removed the locale dependencies.</p>
<h3>Grammatical differences</h3>
<p>One of the difficult problems of creating text on the fly is grammatical differences between languages. We have to be careful where our code is implementing grammar rules. For example, here&#8217;s a PHP snippet that prints out the number of unread email items:</p>
<pre><code>
&lt;?php
$unread = 2;
echo $unread, " unread email", ($unread==1)?'':'s';
?&gt;
</code></pre>
<p>This just about works in English. But translating for a different locale we can&#8217;t assume the same grammar rules apply. Obviously pluralising a word isn&#8217;t as straightforward as adding the letter &#8216;s&#8217; to the end.</p>
<p>This particular piece of code needs to be refactored to remove the grammar logic, and then it is in a state to be localised. The obvious first step, along with our text replacement token, is to remove the conditional logic and create two separate text strings:</p>
<pre><code>
&lt;?php
$unread = 2;

// Select the right message
$unreadMsg = "{total} unread emails";
if ($unread==1) {
	$unreadMsg = "{total} unread email";
}

// Replace the token
$unreadMsg = preg_replace('/{total}/', $unread, $unreadMsg);

echo $unreadMsg;
?&gt;
</code></pre>
<p>And we are back into the normal lines of replacing static text with a reference to a translation. (Although Polish pluralisation is more complicated, so would need more logic or perhaps a different approach &#8211; hat tip <a href="http://intranation.com/">Brad</a>).</p>
<h3>String translation libraries</h3>
<p>It becomes a little tedious to create a regex rule for every token that needs to be replaced. And there are classes or libraries to do this for just about any web-facing programming language. A very elegant solution is <a href="http://ejeliot.com/">Ed Eliot&#8217;s</a> PHP 5 <a href="http://www.ejeliot.com/blog/114">text translation library</a>.</p>
<p>Ed&#8217;s class gets the basics right:</p>
<ul>
<li>Translations are in a separate file, one per language</li>
<li>Specifying the language code and it grabs the right translations file</li>
<li>the <code>Get</code> method takes an array of token replacements, and applies them to the translations for you</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://internationalisationtips.com/2009/03/24/translating-dynamic-text-strings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Replacing static text strings with references</title>
		<link>http://internationalisationtips.com/2009/03/24/replacing-static-text-strings-with-references/</link>
		<comments>http://internationalisationtips.com/2009/03/24/replacing-static-text-strings-with-references/#comments</comments>
		<pubDate>Tue, 24 Mar 2009 11:55:56 +0000</pubDate>
		<dc:creator>Isofarro</dc:creator>
				<category><![CDATA[Translations]]></category>
		<category><![CDATA[internationalisation]]></category>
		<category><![CDATA[localisation]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[translation]]></category>

		<guid isPermaLink="false">http://www.internationalisationtips.com/?p=9</guid>
		<description><![CDATA[The first major step to localising a website or web application is to translate all the static text strings into the preferred language. Static text being text that doesn&#8217;t change from request to request. The typical approach to translating this text is to replace every piece of static text with a variable reference that points [...]]]></description>
			<content:encoded><![CDATA[<p>The first major step to localising a website or web application is to translate all the static text strings into the preferred language. Static text being text that doesn&#8217;t change from request to request.</p>
<p>The typical approach to translating this text is to replace every piece of static text with a variable reference that points to a translation dictionary. For example the simple code:</p>
<pre><code>
echo "Hello world";
</code></pre>
<p>The static text <samp>&#8220;Hello World&#8221;</samp> should be replaced with a variable reference. In this example, let&#8217;s create a simple lookup array in PHP (in its own external file, and <code>include</code> it in):</p>
<pre><code>
&lt;?php
$translations = array(
    'hello_world' =&gt; 'Hello World'
    // Other translation strings
);
?&gt;
</code></pre>
<p>We can now replace our original text with the reference <code>$translations['hello_world']</code>, like so:</p>
<pre><code>
echo $translations['hello_world'];
</code></pre>
<p>With the <code>$translations</code> array being in an external PHP file we can have one of these files per locale or language and include the right one at request time.</p>
<p>This is the first step at separating the locale specific information from our code. Adding a new locale means creating a new translations file for that locale and more importantly, no code changes.</p>
<p>A large chunk of internationalising code for localisation involves just replacing existing text with translated versions. It is an important step, and the first one to get right. The next step will involve dealing with strings that are <a href="http://www.internationalisationtips.com/2009/03/24/translating-dynamic-text-strings/">dynamically altered at request time</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://internationalisationtips.com/2009/03/24/replacing-static-text-strings-with-references/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Avoid text image buttons</title>
		<link>http://internationalisationtips.com/2009/01/27/avoid-text-image-buttons/</link>
		<comments>http://internationalisationtips.com/2009/01/27/avoid-text-image-buttons/#comments</comments>
		<pubDate>Tue, 27 Jan 2009 16:20:50 +0000</pubDate>
		<dc:creator>Isofarro</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[buttons]]></category>
		<category><![CDATA[CSS]]></category>
		<category><![CDATA[fonts]]></category>
		<category><![CDATA[forms]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[sliding doors]]></category>
		<category><![CDATA[style]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[Translations]]></category>

		<guid isPermaLink="false">http://www.internationalisationtips.com/?p=3</guid>
		<description><![CDATA[Localising sites should not be about recreating the same images over and over with different text and then spending days fixing layout bugs as each locale needs a different widths to comfortably fit in each text label.]]></description>
			<content:encoded><![CDATA[<p>Styling buttons are a nightmare for web developers, thanks to the inconsistent cross-browser and cross-platform handling. So we take the easy way out and cut out the button image from the design mocks, save it in an image, and replace the default button with our new image.</p>
<p>Using images for buttons is nothing new. Accessibility wise, all you really need is some way of specifying a text-equivalent to the image, and you are essentially done.</p>
<p>The two main techniques of using imaged buttons are using the input type of image, or a button element styled with image replacement.</p>
<p>Typically the size of the image button comes directly from the design specification, which means that the text on the button has actually been a design decision, and the page design has taken that into consideration.</p>
<p>The result is a button that cannot be internationalised, for the following reasons:</p>
<ul>
<li>translating the button text means creating a new image for each translation</li>
<li>the size of each image now has to change to accommodate the translated text.</li>
</ul>
<p>There are two sane approaches to dealing with this situation, and it depends on the willingness of your designers to compromise in favour of international-friendly designs. But both solutions involve reimplementing the button markup, and if there are JavaScript event-handlers registered, there&#8217;s a strong possibility that these would need to be updated to take into account the changed markup.</p>
<p>The first solution is to get rid of the button image entirely, and use a proper submit input button. Then use CSS to style the button away from the default set into something closer to the original design, compromising the exacting details of text-shadows, rounded corners.</p>
<p>Designs that require a particular font, with specific kerning and other font metrics are unfit for internationalising websites, unless you can get your designer to guarantee that they will create and supply all the button images you need for every localisation from now on.</p>
<p>The second solution is to get the design team to create a tile-able button background big enough to cater for any reasonable width of text. Then using a sliding doors technique, a button element (or a cleverly marked-up submit button) can be styled to produce an elastic button background. All that&#8217;s left is to style the font of the real text as closely as reasonably possible to the original design specification.</p>
<p>By making the actual text back into real text your existing ways of translating text can be used, and localising forms becomes a matter of translations and other unique locale-specific requirements, not laboriously dealing with image manipulation.</p>
<p>Text inside images cause headaches in localising sites. It is far better to spend more time upfront coming up with a reasonable compromise that is based on text being real text, not an image.</p>
]]></content:encoded>
			<wfw:commentRss>http://internationalisationtips.com/2009/01/27/avoid-text-image-buttons/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.364 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-05-20 00:37:10 -->

