Internationalisation Tips

practical tips on building an international presence

Translating dynamic text strings

By Isofarro on March 24th, 2009 - No comments

Translating text dynamically generated at request time is more complicated than static non-changing text. Depending on the reason why the text needs to be flexible at request time means taking a different approach.

In the last post we looked at simple static text replacement where the text doesn’t change from request to request. But as soon as we use conditionals to build up a text string this simple lookup method is no longer sufficient.

Simple data inclusion

The simplest of dynamic text examples we have is a message telling us which page we are on.


<?php
$pageNo = 6;
echo "You are currently on page ", $pageNo;
?>

The first pitfall to avoid is to resist creating a translation string for the static-looking text You are currently on page. Yes, this is a static piece of text, but it requires additional context for the message to make sense. This is done in the code by tacking on the page number at the end of the string. The resulting string is You are currently on page 16 which makes sense without further context.

This works in English, but in other languages it may not be appropriate, or grammatically correct for the page number to be at the end. We need to ensure that the placement of the page number is as flexible as possible.

The most straightforward solution is to use a token to represent the $pageNo variable in the text string. And at request time replace these tokens with their variable amounts. This is a three step process:

  • Refactor the static looking text to use a text replacement
  • Process the refactored string at request time
  • Add the refactored string to the translation table

The commonly accepted practice for tokens is to use a single word wrapped in curly brackets. I’d recommend this approach because translation strings are independent of any language limitations, it’s easy to do replacement in PHP. And if you need to use these translations in JavaScript then YUI’s YAHOO.lang.substitute() handles curly bracket replacement right out of the box. (The alternative is sprintf‘s %s syntax, but be careful if there are multiple tokens that could be in a different order to what you’d expect)


<?php
// Step 1: Refactor string
$pageNo = 6;
$pageMsg = "You are currently on page {page}";
echo $pageMsg;
?>

With this refactoring we are basically just echoing out a static string that’s sitting in a new variable. Between defining the variable and writing it out we need to add in some extra code to replace {page} with the actual page number:


<?php
// Step 2: Replace tokens
$pageMsg = "You are currently on page {page}";
$pageMsg = preg_replace('/{page}/', $pageNo, $pageMsg);
echo $pageMsg;
?>

The preg_replace looks for an occurrence of {page} and replaces it with the contents of the variable $pageNo.

Now we can treat the starting string as a static text string and add it to our translations array:


<?php
$translations = array(
	"pageMessage" => "You are currently on page {page}";
);
?>

And update our main code to reference this translation


<?php
// Step 3: Add to the translations dictionary
$pageMsg = $translations['pageMessage'];
$pageMsg = preg_replace('/{page}/', $pageNo, $pageMsg);
echo $pageMsg;
?>

The benefit of this approach is that the token can be anywhere in the string and now requires no code modifications if in one language the token needs to be somewhere other than at the end. We’ve removed the locale dependencies.

Grammatical differences

One of the difficult problems of creating text on the fly is grammatical differences between languages. We have to be careful where our code is implementing grammar rules. For example, here’s a PHP snippet that prints out the number of unread email items:


<?php
$unread = 2;
echo $unread, " unread email", ($unread==1)?'':'s';
?>

This just about works in English. But translating for a different locale we can’t assume the same grammar rules apply. Obviously pluralising a word isn’t as straightforward as adding the letter ‘s’ to the end.

This particular piece of code needs to be refactored to remove the grammar logic, and then it is in a state to be localised. The obvious first step, along with our text replacement token, is to remove the conditional logic and create two separate text strings:


<?php
$unread = 2;

// Select the right message
$unreadMsg = "{total} unread emails";
if ($unread==1) {
	$unreadMsg = "{total} unread email";
}

// Replace the token
$unreadMsg = preg_replace('/{total}/', $unread, $unreadMsg);

echo $unreadMsg;
?>

And we are back into the normal lines of replacing static text with a reference to a translation. (Although Polish pluralisation is more complicated, so would need more logic or perhaps a different approach – hat tip Brad).

String translation libraries

It becomes a little tedious to create a regex rule for every token that needs to be replaced. And there are classes or libraries to do this for just about any web-facing programming language. A very elegant solution is Ed Eliot’s PHP 5 text translation library.

Ed’s class gets the basics right:

  • Translations are in a separate file, one per language
  • Specifying the language code and it grabs the right translations file
  • the Get method takes an array of token replacements, and applies them to the translations for you

Add a comment or reply





Copyright © 2007 - 2009, isolani