New extension for transformed languages

Doug Ewell doug at ewellic.org
Mon Mar 5 01:22:32 CET 2012


(removing Unicode from the cc list, and adding ietf-languages)

Philippe Verdy wrote:

> This will work however only if there are [Suppress-Script]
> registrations for both languages.

Although Mark mentioned transliteration, the 't' extension can be useful
for other types of transformation. There might be value simply in
indicating that the text was originally in Italian (or Breton, etc.) and
was translated into Russian.  That might explain certain word choices,
for example.

> Many languages do not have such implicit script registered in the IANA
> database (e.g. Breton should be implicitly using the Latin script,
> other situations are almost impossible to find in actual usage or in
> publications, except may be if there's some language courses of Breton
> in Japan or Russia; well the implied "Latn" script was not registered
> in the "br" language subtag the last time I checked it in the IANA
> registry).

If it's important to indicate to the end user that the original Breton
text was in Latin letters, but this is not clear to the user, one could
write "ru-t-br-Latn". Note that it may well be clear to the user. The
fact that Breton has no Suppress-Script field in the Registry does not
mean that nobody has any way of knowing what writing system is typically
used for Breton. Normally they will apply the knowledge they already
have.

> For the general case, a tag of the form "xx-t-yy" where xx and yy and
> only language subtags (2 or 3 letters), will not indicate that there
> was any transliteration applied as it is not possible to infer the
> pair of scripts.

"xx-t-yy" could mean text in language 'yy' that was adapted to the
writing conventions typically associated with language 'xx'. This might
or might not imply a transliteration. 'xx' and 'yy' might use the same
script (say, Latin) but very different orthographic conventions. Or it
might mean some other type of transformation from 'xx' to 'yy' that has
nothing to do with writing systems.

> To get the most benefit of the "xx-t-yy" form, the IANA registry
> should contain many more information about implied scripts for the
> existing registered language subtags.

Suppress-Script is a bit like the Acknowledgments section of a book.
Once the author has started thanking people for assistance in producing
the work, it's often hard to know when to stop. You can't realistically
include the guy who sold you the coffee you were drinking while writing
a key chapter, or the guy who unloaded the coffee from the delivery
truck.

The existing Suppress-Script entries are a *subset* of those languages
for which the Registrar and members of the ietf-languages list have
identified a single "by far most common" script. There is no promise
that every language which is normally written in only one script has a
Suppress-Script. We've tried to go down that path, and it results in
either controversy over the measure of "by far most common," or
political posturing by an individual or group, or sheer lack of time and
knowledge to cover every imaginable case.

You are welcome to propose a Suppress-Script for Breton and other
languages, one at a time, if you can provide solid and objective
evidence of "by far most common." Trying to fill in all the "gaps" in
one effort is not worthwhile.

--
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell ­ 



More information about the Ietf-languages mailing list