Learning Gettext tools for Internationalization (i18n)

The world of Gettext is filled with a set of useful tools and documentation. As you may already know Gettext is an internationalization (i18n) and localization (l10n) system that was originally written by Sun Microsystems in the early 1990s. Its mainly targeted at Unix like systems but there are also portable binaries for Windows. In this tutorial, we are going to learn how can we utilize the tools offered by this library and ease our process of internationalizing our applications.

When we talk about internationalization (i18n), we refer to the operation by which a program, or a set of programs turned into a package, is made aware of and able to support multiple languages. When we talk about localization (l10n) we mean the operation by which, in a program already internationalized, we make it adapt itself to handle its input and output in a fashion which is correct for some native language and cultural habits.

Internationalization is usually taken care of by programmers, and localization is usually taken care of by translators.

For the purposes of this article, we are going to see how we can utilize the available Gettext tools in practice with a small Python program and understand better how and what is needed in order to internationalize and localize it from the view of the programmer, the translator and the maintainer of the program.

All the tutorial files are also on GitHub.

Gettext Overview

The GNU Gettext package is a set of tools that handle i18n and l10n aspects in computer programs with the ultimate purpose of minimizing the impact of them and hardly noticeable as possible. With Gettext we mainly work with PO and MO files and variations of them.

PO files are meant to be read and edited by humans and MO files are meant to be read by programs and are binary in nature. One other characteristic of MO files is that they are often different from system to system; non-portable in nature.

A single PO file is dedicated to a single target language. If a package supports many languages, there is one such PO file per language supported, and each package has its own set of PO files. Files ending with .pot  are kind of base translation files found in distributions, in PO file format.

Before we start with our tutorial we need to make sure we have installed Gettext on our system:

Fedora

Ubuntu

Mac

Windows

https://mlocati.github.io/articles/gettext-iconv-windows.html

Example Project

Before we delve into the tooling it’s important to start with an example project that we would like to include i18n and l10n support using Gettext. We are using a small Python application that resembles an Online Bank with operations such as: creating a new Bank account, printing your current statements, depositing some money or transferring funds between existing accounts.

1. Create a main.py file and add the following code:

File: main.py

2. Run the program to see the result in the console:

So far so good.

Let’s say now that the upper management requests to add i18n capabilities so we can support multiple locales for the messages printed. They have assigned you to this task, with the need that you make it easy to use by translators and maintainers.

The first thing you need to do is to mark all strings used for Gettext translations and make them a bit easier to read. Python and a few programming languages support Gettext. We only need to import the relevant library and use the message format function:

3. Import the Gettext library and use the relevant format function to all translatable strings:

If you run the same program again you will see that nothing really has changed. You are now able to use the Gettext tools to translate the files.

Let’s start with creating the  .pot  file

Extracting POT files with xgettext

The xgettext program finds and extract all marked translatable strings, and creates a PO template file out of all these. If you run it without any arguments it will create a file named domainname.po . This file should be used as a basis for all later locale-specific translations as it has all the original program strings. Initially, all strings are empty and they contain only the msgid’s, that is the unique message keys.

We would like to rename this file using a .pot  extension (Template PO file) and place it in the locale folder:

The xgettext tool can be called with the following format:

and in that case, we use the-d flag to specify the language domain and the -p flag to specify the output folder.

Let’s inspect the contents of this file:

File: locale/messages.pot

We won’t be touching this file for now as for new translations we need to make a copy of it and fill the initial metadata strings. However as you can see the program was able to add extra information about the specific placeholder format using the#, python-brace-format comment.

Let’s see now how can a translator can use that.pot file to provide translations for a new language.

Creating PO files with msginit

A new translator comes in and wants to start a new translation for a specific locale. The first thing we need do is to copy that .pot file that we created earlier and change the metadata to show the specific locale info.

The easiest way to do that is with the help of msginit program. For example:

The first time you invoke that command you will have to specify an email address for giving feedback about the translations.

In that case, we use the-i flag to reference the .pot file that we created earlier as base translation file, the--locale  to specify the target locale and the -o  flag to specify the output file.

Let’s inspect the contents of this file:

File: locale/el/LC_MESSAGES/messages.po

As you can see the program added a few locale-specific information about the target language and copied all the strings from the .pot  input file. You could give more info but in most cases, you are good to go. Just provide the relevant translations:

Now add the following 2 lines in the main.py to activate the Greek translations:

File: main.py

Now before we can actually run this program we need to generate the.mo  files from the.po  files as this is the only way that our little Python program can do to recognize our translations.

Turning PO files into MO files with msgfmt

As we mentioned earlier we need.mo  files to use our translations and there is a tool for that:

The msgfmt program generates a binary message catalog from a message catalog.

The calling format of this program is:

Let’s use it now to generate that file from the Greek translations:

In the command above we used the -o flag to specify the output file.

Note: There is also a tool that performs the inverse operation. Themsgunfmt  program attempts to convert a binary message catalog back to a.po  file.

Now we are ready to see the correct translations. Execute the program to see the results:

As the application now evolves we need to intervene from time to time as new untranslated entries are added or removed and strings become not relevant.

Let’s see now how can a translator can use the msgmerge tool to handle those translation needs.

Updating PO files with msgmerge

The msgmerge tool is mainly used for existing.po  files and especially existing translations. If our application updates its base extracted messages from the .pot  file, we need to be able to update the relevant entries in the .po  files we have.

Let’s simulate that now to see how this tool does that in practice.

1. Modify the main.py  and add a new message string then remove some of them and change one of them:

File: main.py

We have removed the messages from the deposit and withdraw methods, we added a new message on the print_bank_info and modified the existing message on the BankAccount constructor.

2. Replace the old.pot  file using the xgettext tool

3. Use the msgmerge tool to update the relevant.po files using the new.pot  template

The calling format of this tool is:

In the example invocation, we used the -o  flag to specify the output file.

Let’s inspect themessages.po  file for the Greek translations to see what was changed

File: locale/el/LC_MESSAGES/messages.po 

Here is the summary of the changes observed:

  • A  fuzzy comment was added to the entry that was updated. Fuzzy translations or entries account for messages that need revision by the translator as the program cannot decide if the meaning has remained the same or it has changed
  • Deleted entries were commented out. This is to show that those entries are no longer available for display.
  • Newly created entries were just placed in the right spot and based on the line number.

The job of the translator now is much easier as he has a lot of information about the status of the changes and what has to do.

Let’s see now how can we find duplicate entries.

Finding duplicate entries with msguniq

Sometimes when merging or manipulating.po  files using the above tools, you may find that some of the messages have the same id string. In order to find and highlight those keys, we can use the msguniq tool.

The calling format of this tool is:

Let’s add a duplicate key and run this tool.

1. Modify thelocale/el/LC_MESSAGES/messages2.po  file and add the following lines:

File: locale/el/LC_MESSAGES/messages2.po

2. Run the msguniq tool and inspect the output

We used the-d  flag to indicate that we want only the duplicate keys to be printed.

Note: This will not work if you have the same message id but with different placeholder parameters. For example, if you have this msgid:

then it would be considered as having a different key and will not be picked up.

The benefit of this tool is that it gives a nice color output and it can work well with multiple domains.

There is also a similar tool called msgcomm that performs a similar job but with a different perspective. It checks two .po  files and finds their common messages. Either case with those tools we should be able to identify duplicate entries.

The last tool for this article is the msgcat utility, that can help us concatenate 2 or more.po files into a single one.

Concatenating PO Files with msgcat

If we have a set of.po  files in different packages or project and we would like to combine them we can use the msgcat tool. It will perform a search to find the common messages first and remove duplicates before creating the output file so the.po  file that was created will also be valid.

Let’s use that in practice.

1. Create a new.po  file in locale/el/LC_MESSAGES folder called messages2.po and move half of the messages from the messages.po file. Make sure also you leave at least one common entry between those 2 files.

File: locale/el/LC_MESSAGES/messages2.po

2. Use the msgcat tool to concatenate those files together.

File: locale/el/LC_MESSAGES/messages3.po

As you can see the tool removed any duplicates and combined the messages in one file correctly.

You can also use the-u  flag to specify that you want to keep only the unique entries, thus it will work just like the msguniq tool.

Alas, most of the tools that we discussed in this article have myriads of flags and options so each case usage is different. If you want to learn in detail about the full spectrum of this library you can visit the official page here. Hopefully, though this article has shown the best practical applications of each tool and you won’t have to invest more time on that.

Use PhraseApp

PhraseApp supports many different languages and frameworks, including Gettext. It allows to easily import and export translations data and search for any missing translations, which is really convenient. On top of that, you can collaborate with translators as it is much better to have professionally done localization for your website. If you’d like to learn more about PhraseApp, refer to the Getting Started guide. You can also get a 14-days trial. So what are you waiting for?

Conclusion

I hope with this tutorial to have tempted your interest enough and given you more practical examples of how to use the Gettext collection of programs. Stay put for more future articles related to this topic.

 

Learning Gettext tools for Internationalization (i18n)
4.8 (96.67%) 6 votes
Related Posts
Comments