pseudo-localization variants

Wed Dec 19 22:30:17 CET 2012

From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Michael Everson

>> When Microsoft uses pseudo-loc, AFAIK it's always English.
> If it doesn't differ in any way from English, then it is English.

True. Which has led me to wonder if using "en" as the primary language subtag might make better sense than "qps". But I don't really have a problem with "qps". In fact, a benefit of that is that it ensures that, if a product every got deployed with (derived from English source) pseudo strings, then that would not get used in a partial-loc scenario as a match when the user's preference is for some variety of English.

>>> What *is* pseudo-localized text? How does it differ from Lorem ipsum?
>> 
>> I answered that in my original mail: it's text that is typically in the original development language for a product but in an orthography that exercises the localizability of that product.

> By exhibiting a range of ad-hoc misspellings?

By using a ranges of _different_ spellings, including
- look-alike characters from other scripts (in Unicode parlance, confusables)
- filler characters that make the string longer (generally at the start or end of the whole string)

>> I mentioned that MS uses the ISO 639 private use ("local code") 'qps'. There is nothing in BCP 47 that prevents such a private use subtag from being used in combination with a variant subtag.

> Why do you need a subtag in addition to the private-use code? 

There are different forms of pseudo that get used. For instance, we have a second pseudo variant that is intended to exercise bidi-related issues. (So, for instance, the filler characters would be Arabic or Hebrew, which inherently makes the string as a whole bidirectional and can ensure exercising issues like how neutral characters such as ")" get displayed when the occur at the boundaries.)

>> There is no specific proposal yet. I raised the discussion with a view to a possible proposal. I think there should be some way to denote pseudo-loc content in a valid BCP 47 tag.

> Why, if "pseudo-localization" content isn't any kind of stable entity?

It's something that software companies increasingly use. So while there is no common dictionary for pseudo-loc content, it is a stable concept that is widely used.

...

>> (E.g., a user can add one of these in their language profile in Windows 8 and then create documents in which these might get used, or if they browse on the web they would be included in an http accept-language header.)
>
> Why should a private-use scenario be of concern?

In well-designed systems, it might not be a concern. But there's a risk that a user / the user's data might encounter a process that chokes on an invalid tag.

Peter