Chapter 8 Message Translations
This chapter covers internationalization in R, i.e., the display of messages in languages other than English. All
output in R (such as messages emitted by
message()) is eligible for translation, as
are menu labels in the GUI. Depending on the version of R that you are using, some of the languages might already
be available while others may need work. R leverages the
to handle the conversion from English to arbitrary target languages.
Having messages available in other languages can be an important bridge for R learners not confident in English – rather than learning two things at once (coding in R and processing diagnostic information in English), they can focus on coding while getting more natural errors/warnings in their native tongue.
gettext manual is a more canonical reference for a
deep understanding of how
gettext works. This chapter will just give a broad overview, with particular focus on
how things work for R, with the goal of making it as low-friction as possible for developers and users to contribute
8.1 How translations work
Each of the default packages distributed with R (i.e., those found in
./src/library such as
stats and which have priority base) contains a
po directory that is the central location for cataloguing/translating each package’s
.pot file is a snapshot of the messages available in a given domain. A domain in R typically identifies
a source package and a source language (either R or C/C++). For example, the file
(found in the R sources in
./src/library/stats/po) is a catalogue of all messages produced by R code in the
base package, while
stats.pot is a catalogue of all messages produced by C code in the
There are two exceptions to the basic pattern described above. The first is the domain for messages produced by
the C code which is the fundamental backing of R itself (especially, but not exclusively, the C code under
./src/main). The associated
.pot file is
R.pot and it is found in
R-base.pot is a
.pot file because base has a normal
The second is the domain for the Windows R GUI, i.e., the text in the menus and elsewhere in the R GUI program
available for running R on Windows. These messages are stored in the
RGui.pot domain, also in the
base, and are most commonly derived from C code found in
./src/gnuwin32. One reason to keep
this domain separate is that it is only relevant to one platform (Windows). In particular, Windows has historically
different character encodings, so that it made more sense for Windows developers to produce translations specifically for Windows, since it is non-trivial for non-Windows users to test their translations for the Windows GUI.
For outside contributors, there’s no need to update .pot files – translators will typically take the R
as given and generate
.po files. These will be sent along to a language-specific translation maintainer, who then
compiles them to send to the R Core developer responsible for translations, who finally applies them as a patch.
To emphasize, this section is almost always not needed for contributing translations – it is here for completeness and edification.
.po files are the most important artifacts for translators. They provide the (human-readable!) mapping between the messages as they appear in the source code and how the messages will appear to users in translated locales.
126.96.36.199 Singular messages
Most messages appear as
msgstr pairs. The former gives the message as it appears in the code, while the
latter shows how it should appear in translation. For example, here is an error in German (locale:
the user that their input must be of class
msgid "'to' must be a \"POSIXt\" object" msgstr "'to' muss ein \"POSIXt\" Objekt sein"
See this in context in the
R-de.po source file.
The same message can also be found in
giving the translation to Italian:
msgid "'to' must be a \"POSIXt\" object" msgstr "'to' dev'essere un oggetto \"POSIXt\""
188.8.131.52 Plural messages
Some messages will have different translations depending on some input determined at run time (e.g., the
an input object or the
nrow() of a
data.frame). This presents a challenge for translation, because different languages
have different rules for how to pluralize different ordinal
numbers[^See the relevant section of the
For example, English typically adds
s to any quantity of items besides 1 (1 dog, 2 dog
s, 100 dog
s, even 0 dog
Chinese typically does not alter the word itself in similar situations (一只狗, 两只狗, 一百只狗, 零只狗); Arabic has six
different ways to pluralize a quantity.
.po files, this shows up in the form of
msgid_plural entries, followed by several ordered
msgstr entries. Here’s an example from
msgid "Warning message:\n" msgid_plural "Warning messages:\n" msgstr "Warnmeldung:\n" msgstr "Warnmeldungen:\n"
The two entries in English correspond to the singular and plural messages; the two entries in German correspond similarly, because
pluralization rules in German are similar to those in English. The situation in Lithuanian
is more divergent:
msgid "Warning message:\n" msgid_plural "Warning messages:\n" msgstr "Įspėjantis pranešimas:\n" msgstr "Įspėjantys pranešimai:\n" msgstr "Įspėjančių pranešimų:\n"
This corresponds to the 3 different ways to pluralize words in Polish.
2 correspond to, exactly? Ideally, this will be clear to native speakers of the language, but for
clarity, it is the solution to a small arithmetic problem that can be found in the language’s metadata entry. Look for the
Plural-Forms entry in the metadata at the top of the
here it is for Lithuanian:
"Plural-Forms: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && (n" "%100<10 || n%100>=20) ? 1 : 2);\n"
nplurals tells us how many entries correspond to each
msgid_plural for this language.
plural tells us, for the
n, which entry to use. The arithmetic is C code; most important if you really want to parse this and are only
familiar with R code is C’s ternary operator:
test ? valueIfTrue : valueIfFalse
is a handy way to write R’s
if (test) valueIfTrue else valueIfFalse.
Parsing, we get the following associations:
0entry corresponds to when a number equals 1 modulo 10 (i.e., 1, 11, 21, 31, …) except numbers equaling 11 modulo 100 (i.e., 11, 111, 211, 311, …). Combining, that’s 1, 21, 31, …, 91, 101, 121, 131, …, 191, …
1entry corresponds to numbers at least 2 modulo 10 (2, 3, …, 8, 9, 12, 13, 14, …) and either below 10 modulo 100 (0, 1, …, 9, 100, 101, …, 109, …) or exceeding 20 modulo 100 (21, 22, …, 99). Combining, that’s 2, 3, …, 9, 22, 23, …, 29, 32, 33, … 39, …, 102, 103, …, 109, 122, 123, …
2entry corresponds to all other numbers, i.e. 0, 10, 11, 12, …, 19, 20, 30, …, 90, 100, 110, 111, 112, …
.po files are plain text, but while helpful for human readers, this is inefficient for consumption by computers.
The .mo format is a “compiled” version of the .po file optimized for retrieving messages when R is running.
In R-devel, the conversion from .po to .mo is done by R Core – you don’t need to compile these files yourself.
They are stored in the R sources at
./src/library/translations/inst in various language-specific subdirectories.