Distinguishing Greek and Greek
mark.davis at jtcsv.com
Wed Mar 9 02:04:35 CET 2005
I was simplifying the example, but it is also a simplification to say that
"language matching" does X; you have to specify the domain. HTTP Accept
language matching behaves the way you say, but it is quite common to use the
MSFF matching rules in other areas, both for locale matching and language
matching. For the latter, what you would typically have is a hierarchy of
information, something like the following, with some holes.
----- Original Message -----
From: "Addison Phillips" <addison.phillips at quest.com>
To: "Mark Davis" <mark.davis at jtcsv.com>; "IETF Languages Discussion"
<ietf-languages at iana.org>; "Michael Everson" <everson at evertype.com>
Cc: <cldr at unicode.org>
Sent: Tuesday, March 08, 2005 16:34
Subject: RE: Distinguishing Greek and Greek
> Not exactly.
> Language matching using the left-matching rule is the opposite of locale
> If you specify "el-Grkp-GR", you aren't supposed to get anything "less
granular" than that (i.e. "el-grkp-GR-boont" matches your request but
"el-grkp" does not).
> In the second example you would get "el-GR-polyton" from A and nothing (or
the local default language) from B.
> In other words, with language matching you must specify the least
acceptable content you'll accept. With locale matching you specify the most
acceptable (and fall back).
> Variants *do* have a significant impact on how the matching proceeds
> Addison P. Phillips
> Globalization Architect, Quest Software
> Chair, W3C Internationalization Core Working Group
> Internationalization is not a feature.
> It is an architecture.
> > -----Original Message-----
> > From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> > bounces at alvestrand.no] On Behalf Of Mark Davis
> > Sent: mardi 8 mars 2005 16:26
> > To: IETF Languages Discussion; Michael Everson
> > Cc: cldr at unicode.org
> > Subject: Re: Distinguishing Greek and Greek
> > That is a possibility, but it is sub-optimal. It is thus again because
> > country differences are generally far less important than script
> > (including
> > major orthographic variants like monotonic vs polytonic), and (b) when
> > language tags are matched, they are treated as most-significant-field
> > first.
> > This is very similar in that respect to Hans vs Hant, which is a choice
> > which different subset of Han characters encoding in Unicode that are
> > to represent Chinese, and the same reasoning applies.
> > Thus for example, suppose I have a web page drawing together different
> > sources of information. (I am simplifying the following example for
> > illustration.)
> > 1. The desired text is el-Grkp-GR. I am drawing data from two sources A
> > and
> > B:
> > A has data for el-Grkp-GR, el-Grkm-GR
> > B has data for el-Grkm-GR and el-Grkp
> > What I get is then el-Grkp-GR from A and el-Grkp from B. That is, under
> > the
> > normal use of most-significant-field first, the best match in B for
> > el-Grkp-GR is el-Grkp.
> > 2. Consider if we coded it with a variant. In that notation, I would be
> > asking for el-GR-polyton, and
> > A has data for el-GR-polyton, el-GR-monoton
> > B has data for el-GR-monoton and el-polyton
> > What I get is then el-GR-polyton from A, but from B we get the wrong
> > result -- we mix it with el-GR-monoton. That is, under the normal use of
> > most-significant-field first, the best match in B for el-GR-polyton is
> > one that matches the first two fields, el-GR.
> > Option #2 (using a variant) presents a mixture of monotonic and
> > to
> > the user, which is not very satisfactory at all. Now, the one difference
> > with Han would be if someone objected that Greek is only ever
> > spoken/written
> > in a single country, and there would never, ever, be any need to have a
> > country variant. If that were the case, then encoding as a variant would
> > not
> > be as bad. But not being omniscient I am reluctant to make such a strong
> > claim about the use of Greek!
> > Mark
> > ----- Original Message -----
> > From: "Michael Everson" <everson at evertype.com>
> > To: "IETF Languages Discussion" <ietf-languages at iana.org>
> > Cc: "Erkki Kolehmainen" <eik at iki.fi>
> > Sent: Tuesday, March 08, 2005 11:00
> > Subject: RE: Distinguishing Greek and Greek
> > > At 10:57 -0800 2005-03-08, Addison Phillips wrote:
> > > >8 characters is the maximum per RFC 3066:
> > > >
> > > > The syntax of this tag in ABNF [RFC 2234] is:
> > > >
> > > > Language-Tag = Primary-subtag *( "-" Subtag )
> > > >
> > > > Primary-subtag = 1*8ALPHA
> > > >
> > > > Subtag = 1*8(ALPHA / DIGIT)
> > >
> > > Grand so.
> > > --
> > > Michael Everson * * Everson Typography * * http://www.evertype.com
> > > _______________________________________________
> > > Ietf-languages mailing list
> > > Ietf-languages at alvestrand.no
> > > http://www.alvestrand.no/mailman/listinfo/ietf-languages
> > >
> > _______________________________________________
> > Ietf-languages mailing list
> > Ietf-languages at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/ietf-languages
More information about the Ietf-languages