Phonetic orthographies
Doug Ewell
dewell at adelphia.net
Wed Nov 29 18:34:38 CET 2006
Gerard Meijssen <gerardm at wiktionaryz dot org> wrote:
>> If I personally were tagging content in these languages, I'd probably
>> use "ro" to mean "ro-Ro", and "mo" to mean "ro-MD", because I would
>> expect existing content to be tagged like that and I'd want the
>> searching/filtering process that find that content to find mine too.
>
> The problem with this approach is that in Transnistria, a part of
> Moldova that is an unrecognised independent republic, the Cyrillic
> script is used. During the conflict that resulted in Transnistria, the
> Moldovan government declared the use of Cyrillic script illegal the
> same happened for the Latin script in Transnistria. It is therefore
> not safe to use country codes to indicate Cyrillic content as
> Moldovan; Cyrillic is illegal in Moldova.
and Mark Davis replied:
> If Cyrillic is needed, that's simple: just add Cyrl.
Exactly. It's not generally a good idea to indicate "language L written
in non-obvious script S" by pretending that it's not really language L,
but some other language. Script subtags were added largely to solve
problems like this, as well as to supplant the overloading of country
codes as indicators of script (e.g. "zh-CN" vs. "zh-TW").
To the extent that "Moldavian" was identified in ISO 639 as a distinct
language from Romanian on the basis of being written in Cyrillic, as
opposed to genuine differences in vocabulary, grammar, pronunciation,
etc., that is a historical fact that we cannot deny. But script subtags
are a more appropriate mechanism for RFC 4646 applications to identify
script.
So some possible tagging options are:
1. "ro" or "ro-RO" for Romanian/Moldavian as spoken and written in
Romania
2. "mo" or "ro-MD" for Romanian/Moldavian as spoken and written in most
of Moldova
3. "mo-Cyrl" or "ro-MD-Cyrl" for Romanian/Moldavian as spoken and
written in Transnistria
This discussion does not apply to ISO 639-6-type code elements, if and
when they are supported by a future revision to RFC 4646, since some of
them do naturally encode concepts like written vs. spoken and choice of
script.
--
Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages
More information about the Ietf-languages
mailing list