Problems deciding if az- should have multiple registrations...
Peter_Constable at sil.org
Peter_Constable at sil.org
Mon Apr 14 20:24:30 CEST 2003
Addison Phillips wrote on 04/14/2003 05:40:15 PM:
> Actually, I note that IE makes a script distinction ["Azeri (Latin)" and
> "Azeri (Cyrillic)"] in its languages dialog, but these both result in
the
> RFC3066 tag "az". The GUI implies a distinction that cannot be
represented
> in a standard tag.
>
> My concern about this hypothetical is that it implies that if I *do*
find a
> system that makes the arbitrarily deeper (generative) distinction, I
won't
> be able to set the behavior on that system correctly. You can already
see
> this with the IE example.
But in that IE example, there's obvious motivation for wanting to support
the distinction; not so with the hypothetical az versus az-AZ.
> I also know for a fact that many more common locales are fully
generated. My
> Solaris 2.8 system has:
>
> th and th_TH
> ko and ko_KR
> ja and ja_JP
> it and it_IT
> sv and sv_SE
But what difference can th versus th_TH possibly correspond to?
*If* we want RFC3066(bis) tags (th-TH, etc.) to be provided that can be
mapped to this legacy locale cruft, perhaps we should explicitly register
a complete list of these things, documenting exactly why they exist, and,
whenever it's the case that no system makes any significant difference
between the locales that specify a country and those that don't, that
these are equivalent# to the shorter forms.
#Harald raised the issue of a counterpart to Unicode's canonical
equivalence / normalisation. I wonder if we might even need something like
the distinction between canonical and compatibility equivalence.
> This is addition to the usual compement of en, es, and fr locales. But
these
> latter have actual language or locale differences to distinguish them.
Of
> the foregoing, only Korean is really a candidate for indeterminant
behavior.
> But there is a very real distinction between (for example) the sv and
sv_SE
> locales or the ja and ja_JP locales. Not the least of which is: if I
want a
> Unicode (UTF-8) locale, I need the long form and not the short one in
both
> cases, at least on my machine.
You want us to have RFC3066(bis) tags to distinguish character set
encodings? I'm somewhat open to considering your suggestion that we allow
"language" tags to support the distinctions in legacy locales so as to be
able to deal with the legacy encoding problems and to move on to better
solutions. But when the distinction is one of encoding, I think that's
going too far. In HTML and XML, there are already mechanisms distinct from
those that use RFC3066 tags to handle charset/encoding distinctions.
(Indeed, between HTML and HTTP, there are multiple mechanisms for handling
charset/encoding distinctions.) I really think it would be a bad idea to
get RFC3066(bis) mixed up in anyway with character set encoding issues.
- Peter
---------------------------------------------------------------------------
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
>
> >
> > > 1. *Are* we in agreement that RFC3066bis needs writing?
> >
> > My vote is yes.
>
> Me too, but I'm waiting for the Official Poll from Harald before wading
much
> deeper into the discussion.
>
> Best Regards,
>
> Addison
>
> Addison P. Phillips
> Director, Globalization Architecture
> webMethods, Inc.
>
> +1 408.962.5487 (phone) +1 408.210.3569 (mobile)
> -------------------------------------------------
> Internationalization is an architecture.
> It is not a feature.
>
> Chair, W3C-I18N-WG Web Services Task Force
> To participate see http://www.w3.org/International/ws
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
More information about the Ietf-languages
mailing list