Criteria for languages?
petercon at microsoft.com
Fri Dec 4 22:44:07 CET 2009
Mark: what is the objective here?
From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?
Sent: Friday, December 04, 2009 9:25 AM
To: Peter Constable
Cc: Randy Presuhn; ietf-languages at iana.org
Subject: Re: Criteria for languages?
We are starting to get somewhere. It would help me if you would look over the strawman criteria that I put out, just to see where we are agreeing or not. Below, I substituted what you appear to have as a criterion (and also fixed the omission that Randy noted). With these changes, is this what you are thinking of?
1. X is being encoded,
2. NEW: A major industry body has been tagging X as Y (rightly or wrongly)
* OLD: A reasonable person, based on information in the registry, could have tagged X-content as Y in the past
1. There is good evidence that a substantial amount of data has been so tagged,
2. and X and the standard/predominent version of Y are not mutually comprehensible (at least to the degree that say Scots English and Mississippi English are)
Then Y should be made into a macrolanguage, and a new Z should be encoded to represent the standard form of Y.
B. For matching, Y should match Y, X and Z. (X should match X, and Z should match Z).
C. For lookup, Y should fetch content marked with Z. (X should fetch X, and Z should fetch Z).
On Fri, Dec 4, 2009 at 08:41, Peter Constable <petercon at microsoft.com<mailto:petercon at microsoft.com>> wrote:
From: ietf-languages-bounces at alvestrand.no<mailto:ietf-languages-bounces at alvestrand.no> [mailto:ietf-languages-bounces at alvestrand.no<mailto:ietf-languages-bounces at alvestrand.no>] On Behalf Of Mark Davis ?
> A strict approach would be that if Latgalian is indeed a different
> language from (mutually incomprehensible with) Latvian, then it
> was incorrect to tag any Latgalian with "lav", and we just encode
> a new language and move on. Same for Walliserdeutsch.
That sounds entirely reasonable. It also sounded reasonable that Unicode should not encode any precomposed characters but rather use a dynamic-composition model. In both cases, legacy practice realistically keeps us from doing all the things that seem most reasonable. A major industry body has clearly been using "lav" for Latgalian (albeit this appears to have started only in the past 6 years); I'm not aware of indicators of any, let alone reasonably-widespread, use of either "de" or "gsw" for Walliserdeutsch, and so if Walliserdeutsch is deemed a separate language then I wouldn't saddle de or gsw with the hassles of a macrolanguage.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ietf-languages