Phonetic orthographies

Randy Presuhn randy_presuhn at mindspring.com
Sat Nov 11 20:48:37 CET 2006


Hi -

Whether you have problems with the procedures or syntax prescribed by
RFC 4646, or are simply interested in how its successor will be
produced, the correct mailing list for such discussion is ltru at ietf.org.
ietf-languages at iana.org is for the discussion of specific registration
requests.

I might also add the the situation with with the many languages called
Chinese has been discussed at some length already, and was considered
in the writing of BCP 47.  For more details, the WG mailing list archive
is at http://www1.ietf.org/mail-archive/web/ltru/current/index.html

Randy, ltru WG co-chair

> From: "Gerard Meijssen" <gerardm at wiktionaryz.org>
> To: "Doug Ewell" <dewell at adelphia.net>
> Cc: <ietf-languages at iana.org>; "Sabine Cretella" <s.cretella at wordsandmore.it>
> Sent: Saturday, November 11, 2006 1:48 AM
> Subject: Re: Phonetic orthographies
>
> Hoi,
> When you consider Kantonese, it has many characters that are not part of 
> what are the Chinese character set. It is therefore wrong to assume that 
> written Chinese is indeed an application that cover all the characters 
> that are part of what is supposed to be a single application. There is 
> not even one Chinese character set there are at least two. ISO-639-3 
> defines the languages that are included in the zho macro language 
> individually. To make things more interesting, how would you indicate 
> the dialects of languages like Cantonese or Min-Nan? Can dialects have 
> dialects ?
> 
> The problem of the RFC 4646 is imho, that it does not appreciate that 
> ISO-639-3 makes what are considered languages under the previous codes 
> macro-languages. The consequence is that it is not safe to consider what 
> is included anything but a language. Chinese is not the only macro 
> language. There are also codes like bat (Baltic (Other)) in the 
> ISO-639-3 that are not part of the ISO-639-3. The consequence is that 
> for instance in the Wikimedia Foundation start a project and call their 
> language bat-ltg because they do not want to be associated with Latvian, 
> the language that it is an dialect of according to Ethnologue.
> 
> My appreciation of the RFC 4646 is very much that it aims to preserve 
> backwards compatibility. However, many of the old codes are old. They 
> have been ditched with reason in the ISO-639-3 and the insistence to 
> preserve the outdated codes will imho prove to more of a hindrance than 
> of a benefit when you want to make the Internet more multi lingual. It 
> would have been better to allow for the use of the old codes and advise 
> as best practice to move to the later codes when and where practical.
> 
> The RFC 4646 indicates that specific indications of languages is also 
> needed for things like spell checking maybe even CAT or Computer Aided 
> Translation programs. To make this function there is a need to build 
> upon the existing standardisation work because how do you safely 
> indicate dialects, orthographies? There are no public lists I know off 
> that help indicate what possibilities are recognised, let alone exist 
> for what languages. Indicating orthographies by date is not safe because 
> in Dutch for instance we have an official orthography, "het groene 
> boekje" and an unofficial one, "de witte lijst", they are both from 2006 
> and both have powerful factions using them. Similar situations exist for 
> several other languages that I know off.
> 
> One further argument I would like to add, for languages that have not 
> such a rich history on the Internet: people will use the ISO-639-3 code 
> that is specific to their language. Using the logic of the RFC 4646 they 
> should however use a different code. Something that will be and has been 
> roundly rejected by the people who want to use their code for their 
> language. Practically many of the application that make use of content 
> of the Internet already have to check what language content a website is 
> because the information is often incorrectly attributed according to RFC 
> 4646 or its predecessors.
> 
> Thanks,
>     Gerard
> 
> FYI, I am a member of the Wikimedia Foundation language subcommittee. I 
> am particularly active at http://wiktionaryz.org
> 
> Doug Ewell wrote:
> > Gerard Meijssen <gerardm at wiktionaryz dot org> wrote:
> >
> >> Actually, zh is only to be used within the confines of ISO-639-1 and 
> >> ISO-639-2. The new standard has zh and consequently zho marked as a 
> >> macro language. Making assumptions on the basis of the zh code is 
> >> only useful for the hopefully short period until the use of zh is 
> >> only to be used for historical reasons.
> >>
> >> Consequently using zh as an example for other use is not a great idea.
> >
> > In fact, the concept of "Chinese" as a language continues to be a 
> > valuable one for those applications that do not distinguish between 
> > the various Chinese languages/dialects.  (Written Chinese, in either 
> > traditional or simplified script, is such an application.) 
> > Consequently, the ISO 639 codes "zh" and "zho" and the RFC 4646 
> > language subtag "zh" are not going away any time soon.
> >
> > A similar situation applies to other macrolanguages.
> >
> > -- 
> > Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
> > http://users.adelphia.net/~dewell/
> > http://www1.ietf.org/html.charters/ltru-charter.html
> > http://www.alvestrand.no/mailman/listinfo/ietf-languages
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages



More information about the Ietf-languages mailing list