Encoding scripts in tags: evil or just unpleasant?

Mark Davis mark.davis at jtcsv.com
Fri May 23 08:21:23 CEST 2003

It appears that the main issue is the 'default'. I really find it very
hard to understand Michael's objections.

1. RFD 3066 provides for differences in written form, and script is a
huge difference; far, far more different than between British and
American spelling, or between German pre 1996 and post.

2. Michael keeps talking about duplicate encodings, but as many people
have pointed out, they are not duplicates. We have very much an
analogous situation now:

en means any English
en-US means English as used in the US
en-CA means English as used in Canada

All of these are already in RFC 3066 right now, even as we speak. The
vast majority of software uses US English as the default for en. By
Michael's logic, that would mean that "en" equals "en-US". But of
course, they are not duplicates: "any English" is not equal to
"English as used in the US", and nobody considers these duplicates.

3. Moreover, there are many circumstances for the use of language
codes where there is *no* notion of a default (see

4. And in many cases any default is arbitrary: Azeri has no obvious
default, as in

►  “Eppur si muove” ◄

----- Original Message ----- 
From: <Peter_Constable at sil.org>
To: <ietf-languages at iana.org>
Sent: Friday, May 23, 2003 06:27
Subject: Re: Encoding scripts in tags: evil or just unpleasant?

> John Cowan <cowan at mercury.ccil.org> wrote on 05/23/2003 08:03:38 AM:
> > > I wouldn't be able to comment on whether the idea of including
> IDs
> > > is being abused for inappropriate purposes in any requests until
I had
> a
> > > chance to review what's been happening.
> >
> > In a nutshell: if a language is written predominantly in one
script, but
> > also in others, should there be a lang-script registration for the
> dominant
> > as well as the occasional scripts?
> Without having had a chance to review the arguments that have been
> presented, my thinking at the time I wrote the paper to which you
> is that we do *not* need to register tags like "en-Latn" because
"en" could
> be assumed to imply "Latn" unless specified otherwise (and,
effectively is
> already used this way).
> I think Peter Edberg suggested that we should document cases in
which we
> consider a language tag to imply a default script, and I think I'd
> This should be done in a stable way: if "en" implies "latn" today,
and a
> hundred years from now the English-speaking world is suddenly swept
> Hellenistic revival and changes to Greek script, that should not
mean that
> "en" should immediately imply Greek rather than Latin script.
> As I say, that has been my thinking. But your recent discussions
here on
> Serbian etc. are the first open debate I'm aware of on those ideas,
> perhaps somebody has found serious flaws.
> > (There is also a question whether Cyrillic is dominant in this
sense for
> > Serbian, or whether Latin and Cyrillic are equally significant
> I don't know the situation well enough to know how dominant either
> is, but perhaps if there's any debate about whether Cyrillic is
> enough, maybe that suggests it isn't dominant enough ?? I.e. perhaps
> should only consider a default script to be implied by a language ID
> there is no reasonable doubt that it is appropriate -- if it's
> obvious ??
> - Peter
> --------------------------------------------------------------------
> Peter Constable
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages

More information about the Ietf-languages mailing list