Distinguishing Greek and Greek

Mark Davis mark.davis at jtcsv.com
Wed Mar 9 01:25:32 CET 2005

That is a possibility, but it is sub-optimal. It is thus again because (a)
country differences are generally far less important than script (including
major orthographic variants like monotonic vs polytonic), and (b) when
language tags are matched, they are treated as most-significant-field first.

This is very similar in that respect to Hans vs Hant, which is a choice of
which different subset of Han characters encoding in Unicode that are used
to represent Chinese, and the same reasoning applies.

Thus for example, suppose I have a web page drawing together different
sources of information. (I am simplifying the following example for

1. The desired text is el-Grkp-GR. I am drawing data from two sources A and
A has data for el-Grkp-GR, el-Grkm-GR
B has data for el-Grkm-GR and el-Grkp

What I get is then el-Grkp-GR from A and el-Grkp from B. That is, under the
normal use of most-significant-field first, the best match in B for
el-Grkp-GR is el-Grkp.

2. Consider if we coded it with a variant. In that notation, I would be
asking for el-GR-polyton, and
A has data for el-GR-polyton, el-GR-monoton
B has data for el-GR-monoton and el-polyton

What I get is then el-GR-polyton from A, but from B we get the wrong
result -- we mix it with el-GR-monoton. That is, under the normal use of
most-significant-field first, the best match in B for el-GR-polyton is the
one that matches the first two fields, el-GR.

Option #2 (using a variant) presents a mixture of monotonic and polytonic to
the user, which is not very satisfactory at all. Now, the one difference
with Han would be if someone objected that Greek is only ever spoken/written
in a single country, and there would never, ever, be any need to have a
country variant. If that were the case, then encoding as a variant would not
be as bad. But not being omniscient I am reluctant to make such a strong
claim about the use of Greek!


----- Original Message ----- 
From: "Michael Everson" <everson at evertype.com>
To: "IETF Languages Discussion" <ietf-languages at iana.org>
Cc: "Erkki Kolehmainen" <eik at iki.fi>
Sent: Tuesday, March 08, 2005 11:00
Subject: RE: Distinguishing Greek and Greek

> At 10:57 -0800 2005-03-08, Addison Phillips wrote:
> >8 characters is the maximum per RFC 3066:
> >
> >    The syntax of this tag in ABNF [RFC 2234] is:
> >
> >     Language-Tag = Primary-subtag *( "-" Subtag )
> >
> >     Primary-subtag = 1*8ALPHA
> >
> >     Subtag = 1*8(ALPHA / DIGIT)
> Grand so.
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages

More information about the Ietf-languages mailing list