How To Translate Python Applications With The GNU Gettext Module

GNU Gettext is an old but mature solution for i18n development. It can be used to localize any kind of application and it is quite flexible in terms of supporting different locale settings and rules. In this article, we will see how to translate our Python programs using the gettext module that is bundled with the official Python standard library.

Internationalization (i18n) refers to the operation by which a program is made aware of multiple languages. Localization (l10n) refers to the adaptation of your program, once internationalized, to the local language and cultural habits. In theory it looks simple to implement. In practice though, it takes time and effort to provide the best Internationalization and Localization experience for your global audience. In Python, there is a specific bundled module for that and it’s called gettext , which consist of a public API and a set of tools that help extract and generate message catalogs from the source code.

Gettext is a mature and battle-tested solution initially released by Sun Microsystems more than 25 years ago. Gettext provides a set of utilities that allow localizing various programs and even operating systems. In this article, we are going to use this module and walk through the process of localizing a small Python app while learning the different rules and options that it provides.

For the purposes of this demo, I will be using Python 3.6 but the gettext module is bundled in the Python 2.7 version as well. The code is hosted on github.

Introduction To GNU gettext module

GNU Gettext is the defacto universal solution for localization, offering a set of tools that provides a framework to help other packages produce multi-lingual messages. It gives an opinionated way of how programs should be written to support translated message strings and a directory and file naming organization for the messages that need to be translated.

In regards to directory conventions, we need to have a place to put our localized translations based on the specified locale language. For example, let’s say we need to support 2 languages English and Greek. Their language codes areen andel respectively.

We can create a folder namedlocales and inside we need to create folders for each language code and each folder will contain another folder named eachLC_MESSAGES  with one or multiple.po files.

So, the file structure should look like this:

Here we can see that the files have a.po  extension. The PO format is a plain text format, written in files with.po  extension. A PO file contains a number of messages, partly independent text segments to be translated, which have been grouped into one file according to some logical division of what is being translated. Those groups are called domains.  In the example above, we have only one domain named asbase . The PO files themselves are also called message catalogs.

Apart from PO files, you might sometimes encounter  .mo  files. MO, or Machine Object is a binary data file that contains object data referenced by a program. It is typically used to translate program code, and can be loaded or imported into the GNU gettext program.

In addition, there are also.pot  files. These are the template files for PO files. They will have all the translation strings left empty. A POT file is essentially an empty PO file without the translations, with just the original strings. In practice we have the .pot files be generated from some tools and we should not modify them directly.

Using the Python gettext module

The gettext module comes shipped with Python. It provides internationalization (I18N) and localization (L10N) services for your Python modules and applications. This module exposes two APIs. The first one is the basic API that supports the GNU gettext catalog API. The second one is the higher level one, class-based API that may be more appropriate for Python files. The class bases API offers more flexibility and greater convenience than the GNU gettext API and it is the recommended way of localizing your Python applications and modules. This is also the API that we are going to use in this tutorial.

In order to provide multilingual messages for your Python programs, you need to take the following steps:

  1. Mark all translatable strings in your program with a wrapper function.
  2. Run a suite of tools over your marked files to generate raw messages catalogs or POT files.
  3. Duplicate the POT files into specific locale folders and write the translations.
  4. Import and use the gettext module so that message strings are properly translated.

Let’s create a sample application to see how are we going to do that in practice.

Example Application

In order to understand the whole process better, it’s important to have an example program that we want to localize. Let’s start with a function that prints some strings.

Now as it is you cannot provide localization options using gettext.

As we said earlier, the first step is to specially mark all translatable strings in the program. To do that we need to wrap all the translatable strings inside_()

Notice that we importedgettext and assigned _  asgettext.gettext. This is to ensure that our program compiles as well.

If you run the program, you will see that nothing has changed:

However, now we are able to proceed to the next steps which are extracting the translatable messages in a POT file.

Generate raw translatable messages

For the purpose of automating the process of generating raw translatable messages from wrapped strings throughout the applications, the gettext library authors have provided a set to tools that help to parse the source files and to extract the messages in a general message catalog.

Originally the GNU gettext only supported C or C++ source code but its extended version xgettext scans code written in a number of languages, including Python, to find strings marked as translatable.

The Python distribution includes some specific programs called pygettext.py and msgfmt.py that recognize only python source code and not other languages.

The location of those files depends mainly on the OS default installation of the Python library. In order to find it you can issue the following command:

This was on MacOS. Generally, it is the /Tools/i18n directory . You may need to runupdatedb  or/usr/libexec/locate.updatedb command beforehand, to update the search indexes.

Once you found the tool, just call it specifying the file you want to parse the strings for:

That will generate abase.pot file in the locales folder taken from ourmain.py program. Remember that POT files are just templates and we should not touch them. Let us inspect the contents of thebase.pot  file:

In a bigger program, we would have many translatable strings following.  Here we specified a domain calledbase because the application is only one file. In bigger ones, I would use multiple domains in order to logically separate the different messages based on the application scope.

Notice that we have a simple convention for our translatable strings. msgid is the original string wrapped in_() . msgstr is the translation we need to provide.

Now we are ready to create our translations. Because we have the template generated for us, the next step is to create the required directory structure and copy the template into the right spot. We’ve seen the recommended file structure before. We are going to create 2 additional folders inside the locales dir like that:

Where:

  • $localedir is locale
  • $language is en and el
  • $domain is base

The.po  files will contain the translations we need to provide.

Copy and rename thebase.pot  into the following folders locale/en/LC_MESSAGES/base.po and locale/el/LC_MESSAGES/base.po. Then modify their headers to include more information about the locale. For example, this is the Greek translation.

You can find specifications for these files at gnu.org website. Every PO file starts with a header entry that contains information about the file, the author, last revision date and pluralization rules.

Although there are a lot of metadata in the header it’s not mandatory to include all of them. Also note that everything in the header is supposed to be in English, to be understandable to users who do not speak that language.

The catalog is built from the .po  file using a tool called msgformat.py. This tool will parse the .po  file and generate an equivalent.mo  file. We mentioned before that the MO files are binary data files that are parsed by the Python gettext module in order to be used in our program. This tool is usually located in the same folder as the pygettext.py

This command will generate abase.mo file in the same folder as thebase.po  file.

So, the final file structure should look like this:

As we have reached this step and we have translated our application lets glue everything together by adding the ability to install and switch the locale languages.

Switching Locale

To have the ability to switch locales in our program we need to actually use the Class based gettext API. In this tutorial, I will explain only one method calledgettext.translation. This method accepts some parameters that can be used to load the associated.mo  files of a particular language. If no.mo file is found, it raises an error so we need to be extra careful to provide the right path.

Add the following code to the program:

The first argument base is thedomain and the method will look for a.po  file with the same name in our locale folder. If you don’t specify a domain it will fallback to the messages domain. The localedir parameter is the directory location of the locale folder you created. This can be either a relative or absolute path. The languages parameter is a hint for the searching mechanism to load particular language code more resiliently. For example, because we specifiedel  it will look for.mo  files in the following list of paths:

If you run the program again you will see the translations happening:

The install method will cause all the_()  calls to return the Greek translated strings globally into the built-in namespace. This is because we assigned_  to point to the Greek dictionary of translations. To go back to the English just assign_  to be the original gettext object or use a lambda to point to the original string that was wrapped.

Thus either of those commands will work:

Now that we know how to setup basic i18n functionality for our program, let’s explore some additional cases that we will encounter while translating our applications.

Finding Message Catalogs

When there are cases where you need to locate all translation files at runtime, you can use thefind function as provided by the class-based API. This function takes a few parameters in order to retrieve from the disk a list of.mo  files available.

You can pass a localedir, a domain and a list of languages. If you don’t, the library module will use the respective defaults, which is not what you intended to do in most cases. For example, if you don’t specify a localdir parameter, it will fallback to sys.prefix + '/share/locale'  which is a global locale dir that can contain a lot of random files.

The language portion of the path is taken from one of several environment variables that can be used to configure localization features (LANGUAGELC_ALLLC_MESSAGES, and LANG). The first variable found to be set is used. Multiple languages can be selected by separating the values with a colon :.

We can see an example of how this works in the program below. Start an interactive Python session inside the project base folder:

Now let’s see what happens when we set the LANGUAGE environment variable to be  el

As you can see it will pick up the environment value for language and use that as the languages parameter.

Let’s test passing the multiple languages in the environment:

To get all translations we need to set the  all=True  parameter otherwise the call will return the first one found.

Plural Rules

So far we handled simple cases of translatable strings. There are also some other cases we need to be aware of as gettext treats them as special cases. Pluralization, for example, is dependant on the language. Some languages have different rules for messages referring to one item or many items.

To make managing plurals easier (and possible), there is a separate set of functions for asking for the plural form of a message. One of them is the ngettext function. To understand how it works let’s add another function with a few messages containing plurals:

We used ngettext function which requires passing 3 parameters. The first is a singular message, the second is a plural denoted message and the third is the amount or quantity that will be interpolated. The returning string is still unformatted so it will print:

When we run the program it will format the messages according to the number passed:

Run again thepygettext.py  tool to generate the new translatable strings. That will produce the following.pot  file:

Note: If you don’t get the plural rules from thepygettext.py  command it’s mainly because you don’t have the latest version of it.  To overcome this you can use the associated xgettext tool which is bundled with the original gettext library like that:

Now, in addition to filling in the translation strings, we will also need to describe the way plurals are formed so the library knows how to index into the array for any given count value.

We need to add the following line in the header section:

  • nplurals is an integer indicating the size of the array (the number of translations used)
  • plural an expression for converting the incoming quantity to an index in the array when looking up the translation. 

For our example, English and Greek include two plural forms:

The singular translation would then go in position 0, and the plural translation in position 1.

Modify the Greek translation to include the plural rules:

Notice the comment starting with #, python-brace-format . This is a way to interpolate the strings. Because we used a Python specific way of doing it, the tool annotated with that info. Another way of interpolating the messages is by using the following format:

Then we would have to format the messages in our program like that:

Now generate the.mo  file as before. If you did all the steps correctly and run the program again you will see the translations happening:

There are a lot of caveats regarding plurals rules. For more information, I suggest you head on the official docs.

Manipulating PO files

To load the PO files in your application and make some manipulations with them, unfortunately, there is no built-in solution. There is, however, a third party library called polib.

Install it first using pip3:

Le’ts import it into our app and load the .po file.

once loaded you can inspect the .po entries.

To see the percentage of translated entries just call the percent_translated method:

Of course, if you were missing some translations you would have a lower percentage.

You can also create new.po  file catalog and add entries to it. First, initiate a new.po  file and add the metadata header for it:

with the file created in memory lets add some entries:

inspect the percentage again:

and save it in a specified path:

you can also save it as a.mo  extension type:

If you see the contents of the files written to disk they correspond to the correct file format. polib supports iterating over all the entries also. Check out their API documentation for more information.

PhraseApp

PhraseApp supports many different languages and frameworks, including Python. It allows to easily import and export translations data and search for any missing translations, which is really convenient. On top of that, you can collaborate with translators as it is much better to have professionally done localization for your website. If you’d like to learn more about PhraseApp, refer to the Getting Started guide. You can also get a 14-days trial. So what are you waiting for?

Conclusion

In this article, we’ve seen how to translate Python applications with the GNU gettext module. We learned what gettext and what the PO files format is. We saw how to add pluralization and interpolation rules. We also learned how to parse PO files with the polib third party library.

I hope you enjoyed the article and that it helped you understand how to integrate i18n capabilities into your next Python app. Please stay put for more detailed articles regarding this subject.

How To Translate Python Applications With The GNU Gettext Module
4.9 (97.5%) 8 votes
Comments