More and more applications today are being used in more than one country and more than one language. For many companies that are looking to move their software products into foreign markets, internationalization (i18n) and localization (l10n) can turn into an ugly and expensive process. The trap that many fall into is thinking that modern operating systems and programming languages will do most of the work for you. Microsoft Windows® comes in multiple languages, and programming languages like Java and C# have built-in support for Unicode and locale-aware functions. The problem is that there are still plenty of ways to get into trouble and create an application that is difficult to internationalize and to translate. Making the necessary changes to the code after the fact is always more expensive than doing it right from the beginning. The good news is that with modern software development environments, i18n is not rocket science. Keeping a few simple rules in mind when designing software will go a long way toward ensuring that deploying software in multiple locales and markets is as painless as possible.
When designing for international software, there are three primary areas to keep in mind: data, locale, and user interface (UI). When thinking about each in the context of software design, the most important question to keep asking is, “What happens when this runs in a non-English environment?” A basic familiarity with the possible issues that can arise across languages and a few simple rules should enable any modern software architect to put together a system that can run across any language, anywhere.
Ten years ago, dealing with text data in multiple languages could be a bit of a nightmare, especially when dealing with multi-byte languages like Chinese, Japanese, and Korean. These days, Unicode has become nearly ubiquitous, which leads many people to say things like, “My application uses Unicode, so I’m fine, right?” Unfortunately, that is not entirely true. Not all systems or modules use the same Unicode encoding, and many file formats still do not use Unicode by default. The rule of thumb to keep in mind here is that encoding formats can always change at program boundaries. When designing the application, simply be aware of what the expected encoding is when interfacing with external modules or systems, serializing data to and from the network, communicating with a database, or pulling data from a file system. Not all file systems or database engines use Unicode by default and in cases where a conversion is necessary, there is always the potential for data loss. This can happen when converting between encodings that do not represent the same set of characters. For example, when converting from Unicode to ASCII, a large amount of data can be lost. However, most modern programming languages contain conversion libraries for dealing with Unicode, and as long as the engineers are aware of the character encodings at each program boundary, it is a relatively easy obstacle to avoid.
Every software application that runs on a modern operating system inherits a locale from the environment. A locale is a combination of a language and a region designation, which also usually includes basic default format information for dates and times, numbers, and currency. Most modern programming languages include a set of libraries and functions that use this information to produce and interpret string information for display to users. For languages like Java and C#, these include most ToString() and Parse() methods. The important rule during design and development is to be aware of what the locale is and how it should affect the program’s behavior.
One common source of bugs is the use of these locale-aware functions in contexts where it is unnecessary. For example, strings are often used to encode data that is only ever used internally in an application. If locale-aware functions are used to produce or parse these strings, the application can perform differently in different language environments, sometimes to disastrous effect. Also, when working on a client-server system, the server often has to deal with data from multiple locales, and it becomes important for the server to keep track of this locale information as distinct from the locale of the machine on which the server is running. The best example of this is a single web application that must serve pages in multiple languages. In this situation the data written for display must follow the locale of the incoming requests rather than that of the server.
The user interface is, of course, the most obvious place where internationalization and localization are important. The general goal is to design an application whose interface can adapt to the current locale, without changing or impairing the underlying functionality. Creating good, usable interfaces in one language can be difficult enough, and adding support for multiple languages can only complicate issues. There are many resources available to address specific issues involved in designing interfaces for different cultures. Depending on the target audience of the application there can be many complicating factors for visual presentation. The most basic include date, time, and currency formats, but things can get more complicated when dealing with things like sort order, left-right orientation, or even functional or data requirements specific to a locale. In general, the cardinal rule is to design a flexible UI whose major components can be changed based on locale. As long as the UI itself is built using a toolkit from one of the major modern operating systems, the major complications with foreign language display and input are taken care of automatically. The following is a list of basic rules to keep in mind.
- Externalize all interface strings into resource files
- Use insertion variables in string resources rather than relying on concatenated strings because word order does change across languages
- Make sure it is relatively easy to change major colors and icons based on locale if necessary
- Keep layouts simple and flexible with room for expansion (translations will most often result in 20-30% expansion in length)
- Avoid using culture-specific metaphors as much as possible
- Avoid using layouts that rely on grammatical constructions of a specific language, e.g. making a text box or other UI element part of a sentence
- Use the default date, time, and number formats from the current locale and avoid relying on hard-coded formats where possible
- Be prepared for the possibility that additional fields or UI elements will need to be added for some languages
Using the general rules above as a guide, it is possible to design and implement a system that will be much cheaper and easier to localize in the end. However, it is important to remember that it is impossible to anticipate everything, and software localization is always an interesting challenge.
Finally, to illustrate how easy it is for applications developed on modern systems to fail when run in a different language environment, consider the following example. A local company built a client-server application all in Java and using and embedded database technology. The company employed a number of developers that had worked on international applications before, and had a general policy that all strings and UI resources must be externalized. Unfortunately, all it took was a simple test to install the application on a Japanese system to prove that they were not ready to even run in a Japanese environment, let alone translate the application. Upon startup the application immediately crashed because its license approval logic relied on a US date format different than what was supplied from the Japanese system. Beyond that, any Japanese data entered into the interface was immediately lost when saved to the database, because the default string type in the database used only the Latin-1 character set.
Further inspection of the code revealed that only roughly half of the strings in the interface had in fact been externalized, and the ones that were had been hopelessly chopped up into sentence fragments that were reassembled by the system at run-time. At least 35 separate issues were found in a matter of a few hours. In short, it was going to take a significant amount of work for the system to be usable in a Japanese environment. These kinds of issues are not really difficult to solve, but they can take a major investment of time and money, and are all easily avoided if the right considerations are made at the beginning of the development process.
Feel free to contact us and share your experiences with international software…