Criteria for languages?

Peter Constable petercon at microsoft.com
Sat Dec 5 01:41:29 CET 2009


Yes, but bubble up a level: let’s say we agree on some objective criteria. Then what? (We don’t make the decisions wrt ISO 639.) Are we simply trying to come to a point of satisfaction that at least we have a handle on what we think makes sense? Do we think this will help us make decisions as to what should be changed in the LTR after a set of ISO 639 changes are published? Are we planning to use this in a coordinated review of requests for changes in ISO 639? Are we planning to propose criteria for the JAC to apply in their process?


Peter

From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?
Sent: Friday, December 04, 2009 2:20 PM
To: Peter Constable
Cc: Randy Presuhn; ietf-languages at iana.org
Subject: Re: Criteria for languages?

The objective here is to see whether there are any reasonably objective criteria behind the application of macrolanguages (not the concept, but the application), or whether it is a haphazard process, where the application is not well grounded. The answer can make a difference as to what we do in ietf-languages with that information.

My first take was that it was haphazard, because in and of themselves Walliserdeutsch and  Latgalian are parallel.

But if the relevant difference, according to what I think you are saying, is that

  1.  Walliserdeutsch doesn't have a "major industry body" that has tagged a substantial amount of data incorrectly as Swiss German.
  2.  Latgalian has a "major industry body" that has tagged a substantial amount of data incorrectly as Latvian
Then that is at least a workable, reasonably objective criterion. And from your statement, I assume that that is the criterion that ISO is using in this case.

Mark

On Fri, Dec 4, 2009 at 13:44, Peter Constable <petercon at microsoft.com<mailto:petercon at microsoft.com>> wrote:
Mark: what is the objective here?

Peter

From: mark.edward.davis at gmail.com<mailto:mark.edward.davis at gmail.com> [mailto:mark.edward.davis at gmail.com<mailto:mark.edward.davis at gmail.com>] On Behalf Of Mark Davis ?
Sent: Friday, December 04, 2009 9:25 AM
To: Peter Constable
Cc: Randy Presuhn; ietf-languages at iana.org<mailto:ietf-languages at iana.org>

Subject: Re: Criteria for languages?

We are starting to get somewhere. It would help me if you would look over the strawman criteria that I put out, just to see where we are agreeing or not. Below, I substituted what you appear to have as a criterion (and also fixed the omission that Randy noted). With these changes, is this what you are thinking of?

====

A. If

  1.  X is being encoded,
  2.  NEW: A major industry body has been tagging X as Y (rightly or wrongly)

     *   OLD: A reasonable person, based on information in the registry, could have tagged X-content as Y in the past

  1.  There is good evidence that a substantial amount of data has been so tagged,
  2.  and X and the standard/predominent version of Y are not mutually comprehensible (at least to the degree that say Scots English and Mississippi English are)
Then Y should be made into a macrolanguage, and a new Z should be encoded to represent the standard form of Y.

B. For matching, Y should match Y, X and Z. (X should match X, and Z should match Z).

C. For lookup, Y should fetch content marked with Z. (X should fetch X, and Z should fetch Z).

Mark
On Fri, Dec 4, 2009 at 08:41, Peter Constable <petercon at microsoft.com<mailto:petercon at microsoft.com>> wrote:
From: ietf-languages-bounces at alvestrand.no<mailto:ietf-languages-bounces at alvestrand.no> [mailto:ietf-languages-bounces at alvestrand.no<mailto:ietf-languages-bounces at alvestrand.no>] On Behalf Of Mark Davis ?

> A strict approach would be that if Latgalian is indeed a different
> language from (mutually incomprehensible with) Latvian, then it
> was incorrect to tag any Latgalian with "lav", and we just encode
> a new language and move on. Same for Walliserdeutsch.
That sounds entirely reasonable. It also sounded reasonable that Unicode should not encode any precomposed characters but rather use a dynamic-composition model. In both cases, legacy practice realistically keeps us from doing all the things that seem most reasonable. A major industry body has clearly been using "lav" for Latgalian (albeit this appears to have started only in the past 6 years); I'm not aware of indicators of any, let alone reasonably-widespread, use of either "de" or "gsw" for Walliserdeutsch, and so if Walliserdeutsch is deemed a separate language then I wouldn't saddle de or gsw with the hassles of a macrolanguage.



Peter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20091205/59c0fe07/attachment-0001.htm 


More information about the Ietf-languages mailing list