Jon Hanna wrote on 04/10/2003 06:26:56 AM:

> The second is that this orthogonal quality doesn't preclude "educated
> guesses". It's perfectly reasonable IMHO to assume Latin script for en-GB
> *as long as you remember that you are making an assumption*.

It has been suggested (perhaps before this thread was moved to this list)
that this should be taken beyond assumptions: that we should construct a
list of implicit relationships so that we know en can be universally
assumed to imply en-Latn (for contexts in which written form is relevant),
ar can be universally assumed to imply ar-Arab, etc.

> Currently the only method for deducing scripts is either heuristically
> at the characters used and then deduce that the script used is whatever
> script uses those characters) or guessing from the language as in the
> point above. While we all agree that this is not ideal, we have to
> that software doing so will continue to exist for some time after a
> solution is available.

The need goes beyond deducing the script used in the content: users need to
be able to specify constraints on content they are searching for.

> Further a solution that places script codes into language codes has some
> strangeness. The hierarchy behind tags is imperfect...
> Whatever way I look at this I cannot find myself satisfied by anything
> attempts to push script information into language tags.

Three+ years ago when we were drafting RFC3066, I had a lot of reservations
about going in this direction, but after simmering over it for a couple of
years and writing a few papers related to the topic, I am convinced it is a
move we should make.

