pseudo-localization variants

Wed Dec 19 18:30:51 CET 2012

> 
> > When Microsoft uses pseudo-loc, AFAIK it's always English.
> 
> If it doesn't differ in any way from English, then it is English.

Amazon currently has five pseudo-translators, not all of which produce "English" output:

- "zenkaku" (full-width English)
- "traditional" letter replacement
- transliterated to kana
- transliterated to Cyrillic
- replacement with characters from specific Unicode blocks (generally Han ideographs)

Of these, only the first two are "readable" as English.

> 
> >> What *is* pseudo-localized text? How does it differ from Lorem ipsum?
> >
> > I answered that in my original mail: it's text that is typically in the original
> development language for a product but in an orthography that exercises the
> localizability of that product.
> 
> By exhibiting a range of ad-hoc misspellings?

Ｚｅｎｋａｋｕ
zèñkàkù
ゼエカク
Ѕенкаку
怎佧哭素色嘚去

These are not all misspellings. At least one doesn't actually "say" anything (meaningful).

> 
> > I mentioned that MS uses the ISO 639 private use (“local code”) ‘qps’. There
> is nothing in BCP 47 that prevents such a private use subtag from being used in
> combination with a variant subtag.
> 
> Why do you need a subtag in addition to the private-use code?

People (localizers in particular) wish to interchange the pseudo-localized text. A private use code by private agreement is fine as long as the text stays within a given company or organization. But there is a desire to use this in a public context, particularly in translation memories, which calls for an assigned code.

> 
> > There is no specific proposal yet. I raised the discussion with a view to a
> possible proposal. I think there should be some way to denote pseudo-loc
> content in a valid BCP 47 tag.
> 
> Why, if "pseudo-localization" content isn't any kind of stable entity?

To tag content so that, for example, the spell checker doesn't yak on it or so that your translation company can recognize that the text is a placeholder and needs replacement with real text.

> 
> > Currently, MS is using tags such as “qps-ploc” pr “qps-plocm”, neither of
> which are valid, but which can leak into the wild.
> 
> Why not just "qps"?

There are different flavors (the above doesn't include things like pre- and post-fixing stuff on the messages)--see above examples. It is useful to be able to identify which flavor was used.

Rich Gillam actually did a presentation on our implementation at IUC36 (http://www.unicodeconference.org/iuc36/program-d.htm#S11-3), but I don't have a link to the presentation handy.

> 
> > (E.g., a user can add one of these in their language profile in Windows 8 and
> then create documents in which these might get used, or if they browse on the
> web they would be included in an http accept-language header.)
> 
> Why should a private-use scenario be of concern?
> 

Private use isn't, but general interchange of these texts *is* of concern.

Addison