Language Identifier List Criteria - Granularity

Wed Dec 22 09:04:41 CET 2004

Hi,

I am trying to decide what to do with the issues raised of the appropriate
level of granularity of languages to distinguish, the dependency on
application, and the criteria for inclusion and placement on the list.

There was a suggestion that languages be grouped into subsets or equivalence
classes among languages.
e.g.

>{de-CH, de-LI)
>{de-DE, de-BE, de-DK, de-LU}

"equivalence" is not the right term...

Moving to the approach of organizing the languages by region, makes the problem
more tractable, since it provides some context:

LI: de-LI, de-CH...

I suggest we need something like an ordered list and some measure of
acceptability or differentiation between each entry.

Someone looking at the list for LI will not know how different de-LI is from
de-CH, or how adequate a substitute de-CH is for de-LI. In this example, they
are (allegedly) very close. There might be other suggestions which might be not
so ideal, but still a typical substitution in practice. For example:
US: en-US, en-CA, en-GB

(for the sake of argument).

Although en-GB is often considered an "international" English, it plays better
in some regions than others and a simple list doesn't give the user the
information about how similar the languages are or how acceptable the list
entries are to a user community (hands waving wildly here).

I wonder if we could use something similar to the accept-language syntax where
a "q" factor indicates some rough guide to quality, within a language group.

LI: de-LI;q=1.0, de-CH;q=0.9, de-DE;q=0.5
CH: de-CH;q=1.0, de-LI;q=0.9, de-DE;q=0.5), fr-CH, it-CH
US: (en-US;q=1.0, en-CA;q=0.8, en-GB;q=0.5), (es-US;q=1.0, es-mx;q=0.9,
es-es;q=0.5)

I have some ideas on what the q factors might be based on, but others on the
list might have better suggestions, if we agree that this approach would be
useful.

Although, I wouldn't want to list all the en, es, fr, entries with q factors,
in every region that spoke one or more of those languages.

Does this approach make sense? Is it feasible to develop the information for
such a list? Is it useful/practical?
tex