Phonetic orthographies

Gerard Meijssen gerardm at
Sat Nov 11 10:48:35 CET 2006

When you consider Kantonese, it has many characters that are not part of 
what are the Chinese character set. It is therefore wrong to assume that 
written Chinese is indeed an application that cover all the characters 
that are part of what is supposed to be a single application. There is 
not even one Chinese character set there are at least two. ISO-639-3 
defines the languages that are included in the zho macro language 
individually. To make things more interesting, how would you indicate 
the dialects of languages like Cantonese or Min-Nan? Can dialects have 
dialects ?

The problem of the RFC 4646 is imho, that it does not appreciate that 
ISO-639-3 makes what are considered languages under the previous codes 
macro-languages. The consequence is that it is not safe to consider what 
is included anything but a language. Chinese is not the only macro 
language. There are also codes like bat (Baltic (Other)) in the 
ISO-639-3 that are not part of the ISO-639-3. The consequence is that 
for instance in the Wikimedia Foundation start a project and call their 
language bat-ltg because they do not want to be associated with Latvian, 
the language that it is an dialect of according to Ethnologue.

My appreciation of the RFC 4646 is very much that it aims to preserve 
backwards compatibility. However, many of the old codes are old. They 
have been ditched with reason in the ISO-639-3 and the insistence to 
preserve the outdated codes will imho prove to more of a hindrance than 
of a benefit when you want to make the Internet more multi lingual. It 
would have been better to allow for the use of the old codes and advise 
as best practice to move to the later codes when and where practical.

The RFC 4646 indicates that specific indications of languages is also 
needed for things like spell checking maybe even CAT or Computer Aided 
Translation programs. To make this function there is a need to build 
upon the existing standardisation work because how do you safely 
indicate dialects, orthographies? There are no public lists I know off 
that help indicate what possibilities are recognised, let alone exist 
for what languages. Indicating orthographies by date is not safe because 
in Dutch for instance we have an official orthography, "het groene 
boekje" and an unofficial one, "de witte lijst", they are both from 2006 
and both have powerful factions using them. Similar situations exist for 
several other languages that I know off.

One further argument I would like to add, for languages that have not 
such a rich history on the Internet: people will use the ISO-639-3 code 
that is specific to their language. Using the logic of the RFC 4646 they 
should however use a different code. Something that will be and has been 
roundly rejected by the people who want to use their code for their 
language. Practically many of the application that make use of content 
of the Internet already have to check what language content a website is 
because the information is often incorrectly attributed according to RFC 
4646 or its predecessors.


FYI, I am a member of the Wikimedia Foundation language subcommittee. I 
am particularly active at

Doug Ewell wrote:
> Gerard Meijssen <gerardm at wiktionaryz dot org> wrote:
>> Actually, zh is only to be used within the confines of ISO-639-1 and 
>> ISO-639-2. The new standard has zh and consequently zho marked as a 
>> macro language. Making assumptions on the basis of the zh code is 
>> only useful for the hopefully short period until the use of zh is 
>> only to be used for historical reasons.
>> Consequently using zh as an example for other use is not a great idea.
> In fact, the concept of "Chinese" as a language continues to be a 
> valuable one for those applications that do not distinguish between 
> the various Chinese languages/dialects.  (Written Chinese, in either 
> traditional or simplified script, is such an application.) 
> Consequently, the ISO 639 codes "zh" and "zho" and the RFC 4646 
> language subtag "zh" are not going away any time soon.
> A similar situation applies to other macrolanguages.
> -- 
> Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14

More information about the Ietf-languages mailing list