Phonetic orthographies
Gerard Meijssen
gerardm at wiktionaryz.org
Sat Nov 11 10:48:35 CET 2006
Hoi,
When you consider Kantonese, it has many characters that are not part of
what are the Chinese character set. It is therefore wrong to assume that
written Chinese is indeed an application that cover all the characters
that are part of what is supposed to be a single application. There is
not even one Chinese character set there are at least two. ISO-639-3
defines the languages that are included in the zho macro language
individually. To make things more interesting, how would you indicate
the dialects of languages like Cantonese or Min-Nan? Can dialects have
dialects ?
The problem of the RFC 4646 is imho, that it does not appreciate that
ISO-639-3 makes what are considered languages under the previous codes
macro-languages. The consequence is that it is not safe to consider what
is included anything but a language. Chinese is not the only macro
language. There are also codes like bat (Baltic (Other)) in the
ISO-639-3 that are not part of the ISO-639-3. The consequence is that
for instance in the Wikimedia Foundation start a project and call their
language bat-ltg because they do not want to be associated with Latvian,
the language that it is an dialect of according to Ethnologue.
My appreciation of the RFC 4646 is very much that it aims to preserve
backwards compatibility. However, many of the old codes are old. They
have been ditched with reason in the ISO-639-3 and the insistence to
preserve the outdated codes will imho prove to more of a hindrance than
of a benefit when you want to make the Internet more multi lingual. It
would have been better to allow for the use of the old codes and advise
as best practice to move to the later codes when and where practical.
The RFC 4646 indicates that specific indications of languages is also
needed for things like spell checking maybe even CAT or Computer Aided
Translation programs. To make this function there is a need to build
upon the existing standardisation work because how do you safely
indicate dialects, orthographies? There are no public lists I know off
that help indicate what possibilities are recognised, let alone exist
for what languages. Indicating orthographies by date is not safe because
in Dutch for instance we have an official orthography, "het groene
boekje" and an unofficial one, "de witte lijst", they are both from 2006
and both have powerful factions using them. Similar situations exist for
several other languages that I know off.
One further argument I would like to add, for languages that have not
such a rich history on the Internet: people will use the ISO-639-3 code
that is specific to their language. Using the logic of the RFC 4646 they
should however use a different code. Something that will be and has been
roundly rejected by the people who want to use their code for their
language. Practically many of the application that make use of content
of the Internet already have to check what language content a website is
because the information is often incorrectly attributed according to RFC
4646 or its predecessors.
Thanks,
Gerard
FYI, I am a member of the Wikimedia Foundation language subcommittee. I
am particularly active at http://wiktionaryz.org
Doug Ewell wrote:
> Gerard Meijssen <gerardm at wiktionaryz dot org> wrote:
>
>> Actually, zh is only to be used within the confines of ISO-639-1 and
>> ISO-639-2. The new standard has zh and consequently zho marked as a
>> macro language. Making assumptions on the basis of the zh code is
>> only useful for the hopefully short period until the use of zh is
>> only to be used for historical reasons.
>>
>> Consequently using zh as an example for other use is not a great idea.
>
> In fact, the concept of "Chinese" as a language continues to be a
> valuable one for those applications that do not distinguish between
> the various Chinese languages/dialects. (Written Chinese, in either
> traditional or simplified script, is such an application.)
> Consequently, the ISO 639 codes "zh" and "zho" and the RFC 4646
> language subtag "zh" are not going away any time soon.
>
> A similar situation applies to other macrolanguages.
>
> --
> Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
> http://users.adelphia.net/~dewell/
> http://www1.ietf.org/html.charters/ltru-charter.html
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
More information about the Ietf-languages
mailing list