Most software developers only consider translation and localization much later in the product development cycle, if they consider it at all. This tutorial explains how to build applications that are ready to localize and easier to maintain in the meantime.
Some Basic Terms
Translation, localization and internationalization are related but refer to different tasks. Translation, which is what most people think of, is the process of translating longer form content, such as documentation, online content, and messages. This is just one part of preparing an application or online service for use in multiple languages.
Localization is the process of translating and formatting your applications interface in other languages. This overlaps a bit with translation but it is a separate process, and requires a different skill set. Navigation menus, for example, must be correctly translated in context, but also formatted so the layout is not broken by longer words. Many languages, such as German, have significantly longer words compared to their English equivalents. You have to get both the meaning and layout just right to properly localize an application.
Internationalization is the process of using the correct formats and units of measure for the country or region your application is being released in. Currency, number formatting, date and time formats, all vary by region. Most programming languages and web frameworks provide tools or templates that make this process pretty straightforward. (This is probably the easiest part of making an application ready for international use).
Make Life Easy For Designers And Marketers
Most applications contain a lot of human readable messages, menus and navigation elements, all of which need to be designed and maintained. Most developers have a bad habit of “hard coding” these into programs, especially early on in the development cycle. Even if you never intend to offer your application in another language, this is a bad practice, because it prevents non-programmers from contributing to the project, for example, by fine tuning the user interface, instructions and other human readable texts. Software developers have a reputation for lacking “user empathy” for a reason.
Each programming language or framework has a set of utilities for localization. Gettext, for example, is widely used in web development languages. With these tools, human readable messages and prompts are stored and maintained separately from application code, and can be changed without requiring any changes to the software. In your application, you simply write statements like this:
print _(“Hello World”)
This statement prints a translation for the message Hello World. The gettext utility looks in a prompt catalog, a text file that contains a list of prompts and their translations. This prompt catalog, or PO file, contains a list of texts like this:
msgid “Hello World”
msgstr “Hello World
Note that to load translations and display those, you would simply load a different prompt catalog, es.po, which contains a list of texts like this:
msgid “Hello World”
msgstr “Hola Mundo”
So by using your programming language’s localization utility, you can make your project easier to maintain, even if you never intend to release it in another language. In the future, should you need to, the transition will be very easy because all of your code is future proof. You can find an excellent introduction to Gettext from O’Reilly Media at onlamp.com/pub/a/php/2002/06/13/php.html
Managing Localization Catalogs
You will typically end up with a collection of localization catalogs for your project. Unless you have an extremely large number of prompts or multiple applications, the simplest thing to do is to have one prompt file per language or locale. What’s the difference between a language and locale. Languages vary by region, for example, American English versus British English. You identify a locale with a combination of a standard language code plus a country or region code. American English is en-US, while British English is en-GB.
I strongly recommend that you look at Transifex (www.transifex.com) to manage your prompt catalogs. Transifex is a version control system for localization, think of it as Github for localization. With Transifex, you upload your master prompt file, this is usually maintained in English and is treated as the authoritative list of every prompt and message related to your project. Maintainers and translators then create child catalogs in other languages and locales as needed. The system makes it easy to keep track of changes in the master catalog, and to track what’s been translated and what needs to be updated in the child catalogs. You then export all of your catalogs periodically when you release an update to your application, and know that you have the most up to date files at that point.
Translating Dynamic Content
Translating dynamic content requires a different approach because prompt catalogs are best used for static content, interface prompts, and the like. Fortunately, there are a variety of cloud based translation services that enable you to request both human and machine translations on the fly via simple web APIs. Gengo, SpeakLike, Straker Translations, and One Hour Translation all provide versions of this service.
Typically what you’ll want to do is script a behavior like the following when looking up a translation for a text:
- Generate an MD5 hash or similar for the source text and target language/locale code
- Check your local cache or memcached to see if there is a cache hit for that entity key
- If not, check a persistent data store (e.g. SQL database) to see if there is a matching record, if yes use that and push it to cache for performance on subsequent reads
- If not, call out to the translation service of your choice. These will typically return a machine translation as a placeholder, and then a human translation once it is completed. (Some can also make an asynchronous callback to your service, so you don’t need to poll for updates).
This pattern is easy to implement, and since most service providers provide easy to understand REST APIs, and in some cases client libraries, you should be able to have this working with a day or two of effort.
Optimizing For Cost And Quality
When you integrate translation into an application there are all sorts of things you can do to optimize for cost, quality and speed. This is a topic for a separate article, but in general, you can do things like:
- Machine translate content in bulk, to get instant but approximate translations for SEO and placeholder use
- Trigger human translation based on sales, page views, or the number of times a particular entity is requested
- Trigger higher quality human translation or QA based on an elements visibility to users (for example, use highest quality level for headlines and synopsis, lower cost translation for content that appears “below the fold”).
The point is that you can devise all sorts of logic to automate decisions like this, and maximize the effectiveness of the translation work you have a budget for. In an online store, for example, you might machine translate every product listing, but then trigger professional translation for the top selling 10% or when a particular item starts getting a lot of page views. This allows you to build automated processes to maximize the return on investment for translation, while also preventing unnecessary translation work.
Even if you don’t have short term plans to make your application or service multilingual, its a good idea to build as if you do. First, you’ll make life easy for designers and non-technical project contributors, who can help you make your service easier to understand and use. Second, you may be surprised by where demand comes from. Web services are inherently accessible to a global audience, so you may discover that your application is most popular in a region you didn’t expect. If you are an online retailer, most countries have important second and third languages, so by catering to those users, you can expand market reach and sell through. The bottom line, even if you don’t translate your app, you’ll save money and build a better product by incorporating localization and translation methodology into your projects from day one. Want to learn more? Give us a call. We’re software developers with over ten years of experience in this space and can show you how to get the best results for the environment you are working in.