Customized Suzuki Intruder in French village
Photo: Copyright © 2014 Eelke Blok

The tale of the mistranslated plurals

Recently, I ran into a strange issue with a Drupal site. Phrases that were based on a count, using the formatPlural() method to determine the correct phrase to report a count, were all translated with the phrase meant for a count of 1. All of them said e.g. 1 vraag or 1 reactie (Dutch for 1 question and 1 comment), even when there were more than one, or in some cases no questions, comments, or whatever the relevant phrase was. Another symptom was that the translation interface did not show the translations for plural translations. It seemed to have been caused by a release we did minutes before; the same problem existed on the acceptance environment and although it was not detected in acceptance testing, this site being a large community, it was definitely not something that would have gone unnoticed for very long, so it was unlikely to have been there since the previous release.

Luckily, I had an inkling of an idea where to look for this, but this issue meant there was suddenly an urgent need to get to the bottom of it.

Some background

Drupal has a nifty mechanism to translate - or just display, when your site uses the default English language - a phrase based on a certain count. For example, imagine a site that allows users to ask questions in certain categories. Each category contains a certain number of questions, ranging between 0 and any arbitrary number. You wish to show the number of questions in each category in an overview. Up front, you do not know how many questions will be in each category. This is where formatPlural() comes in. Assuming you added the StringTranslationTrait to your class, you can do the following:

// ...
$this->formatPlural($count, '1 question', '@count questions');
// ... 

Like with the t() function, it is also possible to pass further placeholders, but that is not relevant to this story.

The format of this function is based on how the English language treats this kind of pattern; using these two phrases, it is possible to format a correct sentence just based on the count. Only for a count of 1, the format really differs. For 0, the phrasing for a multiple can be used (0 questions works fine).

This isn't true for all languages, though. This is why PO files (the files Drupal uses to define translations for strings contained in code) contain a plural formula. It determines, based on the count, which of an arbitrary number of translation strings should be used. For example, for Dutch (which basically works the same as English), the correct entry in the PO file is "Plural-Forms: nplurals=2; plural=(n!=1);\n".

This tells Drupal that there are two plural strings (nplurals = 2) and that the index to the string to use is 0 when n (count) is 1, and index 1 otherwise (the formula evaluates to boolean true or false, which in turn gets converted to 1 or 0). Translations look like this (this example is from the Pathauto translation file):

msgid "Generated 1 URL alias."
msgid_plural "Generated @count URL aliases."
msgstr[0] "1 URL-alias aangemaakt."
msgstr[1] "@count URL-aliassen aangemaakt." 

However, taking another example, Polish has three variants, and its plural formula looks like this: Plural-Forms: nplurals=3; plural=((n==1)?(0):(((((n%10)>=2)&&((n%10)<=4))&&(((n%100)<10)||((n%100)>=20)))?(1):2));. When n equals 1, translation string 0 is used. Otherwise, if the remainder of dividing n by 10 is 2, 3 or 4 ((n%10)>=2)&&((n%10)<=4)) and the remainder of dividing n by 100 is smaller than 10 or the remainder of dividing n by 100 is larger than 20 (i.e. it is not applicable to 2, 3, 4, 12, 13 and 14, possibly following a multiple of 100), use variant 1. In all other cases, use variant 2. Phew, I hope I interpreted that correctly.

Each translation file contains the plural formula for its language. So, which formula applies to the strings in any given translation file? You might guess, the formula in the file applies to the strings contained in that same file. Not unreasonable. But if you take the problem we encountered into account, this does not seem to be the case. The translation strings affected were not from the same translation files; as far as we could tell, all translations were affected; and, as far as we could tell, translations entered by the customer through the translation interface were affected (i.e. not even imported from a translation file). 

The problem

To determine the plural formula to use for any given language, Drupal actually uses a state variable (meaning it can be found in the key_value table, in the state collection) called locale.translation.formulae. It is managed by a service in the Locale module named locale.plural.formula, implemented by the class \Drupal\locale\PluralFormula. The variable contains an array with the following structure:

[
  ['langcode1'] => [
    'plurals' => $plural_count,
    'formula' => $formula,
  ],
  ['langcode2'] =>
    'plurals' => $plural_count,
    'formula' => $formula,
  ],
  // ...
]; 

This is only relevant if you were to attempt to write this variable directly, though, because the formulae service has convenient setter and getter functions to access the individual values, as you would expect. 

If you were to find when the setter is actually called, you'd find that at least one of the occasions is whenever a PO file is imported. So, basically, the formula stored for any given language is the one that was in the last PO file imported.

The cause

Now we're getting somewhere. When investigating the actual value of the state variable, we found it had some puzzling values. The plural count for the Dutch language was INTEGER (so not an integer, but that actual character string). The formula was EXPRESSION. Oh dear...

By now, you probably guessed what was going on, We added a new, custom translations file for a new custom module. The PO file was based on a template, and the template did not contain a proper plural count, or a proper formula (actually, you can probably guess what the placeholders were). As an implementor of the file, you were supposed to fill in proper values. We didn't, and didn't catch it in review either.

The solution

Of course, we had to fix the erroneous content in the PO file, so a colleague prepared a hotfix for that. I did not want to do another release, though, and I also suspected the updated file might not actually get re-imported, so the issue wouldn't actually get solved (we still did a hotfix, to make sure the master branch was in a releasable state, in case another issue would come along that did need an actual hotfix release).

Actually, considering the translations import mechanism uses the file's timestamp to determine if it has been updated, it has a good chance of actually solving the issue. In hindsight, this would be my recommendation; fix the PO file and re-release. If you do try this, leave a comment to let us know whether it worked.

In the interest of speed, I took a bold move and actually copied the value of the variable from another site, which also only used Dutch as a translation language, directly into the database. I definitely do not recommend this course of action. It solved the issue, probably after a cache rebuild (e.g. drush cr).

If re-releasing the fixed PO file does not work, you could also create a custom hook_update_N and explicitly set the correct value. For example, for Dutch:

function my_module_update_8100() {
  $formulae = \Drupal::state()->get('locale.translation.formulae', []);
  if (isset($formulae['nl'])) {
    $formulae['nl'] = [];
    $formulae['nl']['plurals'] = 2;
    $formulae['nl']['formula'] = '(n!=1)';
    \Drupal::state()->set('locale.translation.formulae', $formulae);
  }
}

Note that this code is untested, so YMMV. Again, if you tried it, leave a comment about your experience.

If you ever run into this issue, hopefully this blog post gives you enough clues to get a handle on things. Happy translating!

Comments

That sounds like a good idea. It was not immediately apparent to me what that would look like, but nplurals should be a integer, and the formula should be based on n (or possibly 0, although that would be an edge case and I'm not sure there are any known languages that use the same form regardless of the number reported).

Add new comment