Language tag too specific

Tue Mar 19 15:56:24 CET 2013

Mark Davis 🍥 <mark at macchiato dot com> wrote:

>>> Your formulation points out that the results depend heavily on the
>>> matching algorithm.
>>> Without context I don't know what that is.
>>>
>>> We, for example, identify en with the most populous country, eg the
>>> US.
>>
>> Well, that's because you use an ad-hoc proprietary algorithm rather
>> than any of the three standard ones.
>
> If you are referring to RFC 4647, I'm reasonably familiar with it,
> being one of the editors. It is well recognized that it often doesn't
> produce the best results. It is more a minimum bar than the be-all-
> and-end-all.

John's point may have been that other algorithms aren't necessarily the
best for all purposes either. In some usage environments, it might well
be appropriate to identify a language with the single most populous
country where it is spoken. Other environments might note that the U.S.
accounts for only 22% of English speakers worldwide (Wikipedia), and
might interpret the concept of "generic English" differently.

If the draft did allow both "en" and "en-US", it should specify how
matching or fallback are to be applied: using RFC 4647 or some
tailoring, using the LDML approach, or whatever, and not leave the user
to guess.

--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell