Criteria for languages?

Thu Dec 3 04:17:02 CET 2009

Hi -

> From: "Mark Davis ☕" <mark at macchiato.com>
> To: "Randy Presuhn" <randy_presuhn at mindspring.com>
> Cc: <ietf-languages at iana.org>
> Sent: Wednesday, December 02, 2009 1:37 PM
> Subject: Re: Criteria for languages?
>
> I have nothing against the macrolanguage concept per se.
>
> What I do find very troublesome is that the application of it seems fairly
> ad hoc, with little clarity as to why it is used in one case (Latvian) and
> not in another (Swiss German).

I agree.  What we have is the result of a painful compromise in ltru.
I'd characterize where we left it as "employing macrolanguages in the
generation of language tags doesn't appear to be as generally useful as
we might have thought they might be when we did RFC 4646, but there is
also the reality of extesively used subtags like zh and ar.  So while
we're not going to *encourage* this pattern, we recognize that there
are situations where it may make sense (due to legacy data.)"

Or, as RFC 5646 puts it: "To accommodate language tag forms used
prior to the adoption of this  document, language tags provide a
special compatibility mechanism: the extended language subtag."

What gets Swiss German off the hook is that, despite its name,
its users / taggers are IMO quite aware that it's not 'de'
and would be unlikely IMO to want to tag it as 'de' if any
alternative at all is available.  How inclined are users of
Latgalian to consider their language a variety of Latvian?

> If there were a consistent policy for it, then it could be usefully applied,

I'd be wary of trying to get too specific.  That's the domain of a successor
to RFC 5646, something I have absolutely no desire to work on.

> and implementations could anticipate what they need to do for now and the
> future. I'll throw out a strawman:
>
> A. If
>
>    1. X is being encoded,
>    2. A reasonable person, based on information in the registry, could have
>    tagged X-content as Y in the past
>    3. There is good evidence that a substantial amount of data has been so
>    tagged,

"reasonable person" and "substantial" are wonderful weasel words.  :-)

>    4. and X and the standard/predominent version of Y are not mutually
>    comprehensible (at least to the degree that say Scots English and
>    Mississippi English are)

FWIW, though I am a native speaker of (North Central) American English,
I understand neither of those varieties reliably.

> Then Y should be made into a macrolanguage, and a new Z should be encoded to
> represent the standard form of Y.

I think this is a fair summary of what we did with zh and ar, and what
we perhaps should have done with de if consistency really mattered.

> B. For matching, Y should match both X and Z. (X should match X, and Z
> should match Z).

ok.

> C. For lookup, Y should fetch content marked with Z. (X should fetch X, and
> Z should fetch Z).

Uh, how about content marked with Y?

Randy