Criteria for languages?

Peter Constable petercon at
Fri Dec 4 22:44:07 CET 2009

Mark: what is the objective here?


From: mark.edward.davis at [mailto:mark.edward.davis at] On Behalf Of Mark Davis ?
Sent: Friday, December 04, 2009 9:25 AM
To: Peter Constable
Cc: Randy Presuhn; ietf-languages at
Subject: Re: Criteria for languages?

We are starting to get somewhere. It would help me if you would look over the strawman criteria that I put out, just to see where we are agreeing or not. Below, I substituted what you appear to have as a criterion (and also fixed the omission that Randy noted). With these changes, is this what you are thinking of?


A. If

  1.  X is being encoded,
  2.  NEW: A major industry body has been tagging X as Y (rightly or wrongly)

     *   OLD: A reasonable person, based on information in the registry, could have tagged X-content as Y in the past

  1.  There is good evidence that a substantial amount of data has been so tagged,
  2.  and X and the standard/predominent version of Y are not mutually comprehensible (at least to the degree that say Scots English and Mississippi English are)
Then Y should be made into a macrolanguage, and a new Z should be encoded to represent the standard form of Y.

B. For matching, Y should match Y, X and Z. (X should match X, and Z should match Z).

C. For lookup, Y should fetch content marked with Z. (X should fetch X, and Z should fetch Z).


On Fri, Dec 4, 2009 at 08:41, Peter Constable <petercon at<mailto:petercon at>> wrote:
From: ietf-languages-bounces at<mailto:ietf-languages-bounces at> [mailto:ietf-languages-bounces at<mailto:ietf-languages-bounces at>] On Behalf Of Mark Davis ?

> A strict approach would be that if Latgalian is indeed a different
> language from (mutually incomprehensible with) Latvian, then it
> was incorrect to tag any Latgalian with "lav", and we just encode
> a new language and move on. Same for Walliserdeutsch.
That sounds entirely reasonable. It also sounded reasonable that Unicode should not encode any precomposed characters but rather use a dynamic-composition model. In both cases, legacy practice realistically keeps us from doing all the things that seem most reasonable. A major industry body has clearly been using "lav" for Latgalian (albeit this appears to have started only in the past 6 years); I'm not aware of indicators of any, let alone reasonably-widespread, use of either "de" or "gsw" for Walliserdeutsch, and so if Walliserdeutsch is deemed a separate language then I wouldn't saddle de or gsw with the hassles of a macrolanguage.


-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Ietf-languages mailing list