Encoding scripts in tags: evil or just unpleasant?

Fri May 23 08:21:23 CEST 2003

It appears that the main issue is the 'default'. I really find it very
hard to understand Michael's objections.

1. RFD 3066 provides for differences in written form, and script is a
huge difference; far, far more different than between British and
American spelling, or between German pre 1996 and post.

2. Michael keeps talking about duplicate encodings, but as many people
have pointed out, they are not duplicates. We have very much an
analogous situation now:

en means any English
en-US means English as used in the US
en-CA means English as used in Canada
etc.

All of these are already in RFC 3066 right now, even as we speak. The
vast majority of software uses US English as the default for en. By
Michael's logic, that would mean that "en" equals "en-US". But of
course, they are not duplicates: "any English" is not equal to
"English as used in the US", and nobody considers these duplicates.

3. Moreover, there are many circumstances for the use of language
codes where there is *no* notion of a default (see
http://eikenes.alvestrand.no/pipermail/ietf-languages/2003-May/000986.html).

4. And in many cases any default is arbitrary: Azeri has no obvious
default, as in
http://eikenes.alvestrand.no/pipermail/ietf-languages/2003-May/000987.html.

Mark
__________________________________
http://www.macchiato.com
►  “Eppur si muove” ◄

----- Original Message ----- 
From: <Peter_Constable at sil.org>
To: <ietf-languages at iana.org>
Sent: Friday, May 23, 2003 06:27
Subject: Re: Encoding scripts in tags: evil or just unpleasant?

>
> John Cowan <cowan at mercury.ccil.org> wrote on 05/23/2003 08:03:38 AM:
>
> > > I wouldn't be able to comment on whether the idea of including
script
> IDs
> > > is being abused for inappropriate purposes in any requests until
I had
> a
> > > chance to review what's been happening.
> >
> > In a nutshell: if a language is written predominantly in one
script, but
> > also in others, should there be a lang-script registration for the
> dominant
> > as well as the occasional scripts?
>
> Without having had a chance to review the arguments that have been
> presented, my thinking at the time I wrote the paper to which you
referred
> is that we do *not* need to register tags like "en-Latn" because
"en" could
> be assumed to imply "Latn" unless specified otherwise (and,
effectively is
> already used this way).
>
> I think Peter Edberg suggested that we should document cases in
which we
> consider a language tag to imply a default script, and I think I'd
agree.
> This should be done in a stable way: if "en" implies "latn" today,
and a
> hundred years from now the English-speaking world is suddenly swept
by
> Hellenistic revival and changes to Greek script, that should not
mean that
> "en" should immediately imply Greek rather than Latin script.
>
> As I say, that has been my thinking. But your recent discussions
here on
> Serbian etc. are the first open debate I'm aware of on those ideas,
and
> perhaps somebody has found serious flaws.
>
>
> > (There is also a question whether Cyrillic is dominant in this
sense for
> > Serbian, or whether Latin and Cyrillic are equally significant
scripts.)
>
> I don't know the situation well enough to know how dominant either
script
> is, but perhaps if there's any debate about whether Cyrillic is
dominant
> enough, maybe that suggests it isn't dominant enough ?? I.e. perhaps
we
> should only consider a default script to be implied by a language ID
if
> there is no reasonable doubt that it is appropriate -- if it's
completely
> obvious ??
>
>
>
> - Peter
>
>
> --------------------------------------------------------------------
-------
> Peter Constable
>
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
>
>
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>