Phonetic orthographies

Doug Ewell dewell at adelphia.net
Wed Nov 29 18:34:38 CET 2006


Gerard Meijssen <gerardm at wiktionaryz dot org> wrote:

>> If I personally were tagging content in these languages, I'd probably 
>> use "ro" to mean "ro-Ro", and "mo" to mean "ro-MD", because I would 
>> expect existing content to be tagged like that and I'd want the 
>> searching/filtering process that find that content to find mine too.
>
> The problem with this approach is that in Transnistria, a part of 
> Moldova that is an unrecognised independent republic, the Cyrillic 
> script is used. During the conflict that resulted in Transnistria, the 
> Moldovan government declared the use of Cyrillic script illegal the 
> same happened for the Latin script in Transnistria. It is therefore 
> not safe to use country codes to indicate Cyrillic content as 
> Moldovan; Cyrillic is illegal in Moldova.

and Mark Davis replied:

> If Cyrillic is needed, that's simple: just add Cyrl.

Exactly.  It's not generally a good idea to indicate "language L written 
in non-obvious script S" by pretending that it's not really language L, 
but some other language.  Script subtags were added largely to solve 
problems like this, as well as to supplant the overloading of country 
codes as indicators of script (e.g. "zh-CN" vs. "zh-TW").

To the extent that "Moldavian" was identified in ISO 639 as a distinct 
language from Romanian on the basis of being written in Cyrillic, as 
opposed to genuine differences in vocabulary, grammar, pronunciation, 
etc., that is a historical fact that we cannot deny.  But script subtags 
are a more appropriate mechanism for RFC 4646 applications to identify 
script.

So some possible tagging options are:

1.  "ro" or "ro-RO" for Romanian/Moldavian as spoken and written in 
Romania
2.  "mo" or "ro-MD" for Romanian/Moldavian as spoken and written in most 
of Moldova
3.  "mo-Cyrl" or "ro-MD-Cyrl" for Romanian/Moldavian as spoken and 
written in Transnistria

This discussion does not apply to ISO 639-6-type code elements, if and 
when they are supported by a future revision to RFC 4646, since some of 
them do naturally encode concepts like written vs. spoken and choice of 
script.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages



More information about the Ietf-languages mailing list