[Ltru] RE: (iso639.2708) RE: ISO 639-2 decision: "mis"

Peter Constable petercon at microsoft.com
Tue Jun 19 02:52:31 CEST 2007


Suggesting that 'root' avoids problems is, IMO, rather a bit of false economy. Sure, if a tag means 'some language', then content so tagged never becomes *incorrectly* tagged when an ID for the given language is added. But let's consider whether it is *usefully* tagged: changing things so that it could continue to be *correctly* tagged wouldn't make it more *usefully* tagged, and arguably makes it less so.

To suggest that users needn't worry about their content tagged 'root' after a new language is coded is IMO bad advice. They certainly *should* worry if they want their data to be useful: they'll want to re-tag the relevant content with the newly-coded ID, else they end up with data that won't compare. At least if they know that the addition will narrow the extension and potentially invalidate tagging on some of their content, they're more likely to pay attention; what you suggest can give the impression that they don't have any particular worry, which is not the case.

A tag with the 'root' semantic can always mean anything - which means it's nearly void of meaning and is about as useful as not having tagged it at all in the first place. (The 'root' semantic would be equivalent to 'not zxx'.) That's not *usefully* tagged, but it's vacuously always going to be validly tagged - big deal. At least with the 'uncoded' semantic they can use change history for the code table and the record date to derive a short list of what languages "mis" content might be in - a pain, but that's actually more useful that 'some language'.


Peter

From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis
Sent: Monday, June 18, 2007 1:49 PM
To: Peter Constable
Cc: LTRU Working Group; ietf-languages at iana.org; iso639-2 at loc.gov; isojac at loc.gov; iso639 at dkuug.dk
Subject: Re: [Ltru] RE: (iso639.2708) RE: ISO 639-2 decision: "mis"

I really didn't want to start a flame about this; I'm sorry if what I said was be considered incendiary.

This whole issue is not really connected with the change from part 2 to part 3, at all. Take your example: it is a problem with your definition of "mis" in BCP 47 whether "brk" were added because of 639-3 OR just because it were added to ISO 639-2! It is an issue whenever new codes could be added that would invalidate previous usage of "mis".

And the sad thing is that this instability in ISO codes is completely avoidable. There is a perfectly good way to have the same functionality *without* being unstable.

 *   Have a code I'll call here "root" (to avoid any misunderstanding about the meaning of "mis".)
 *   Have it be valid to tag any language content with "root".
 *   State that one SHOULD tag as narrowly as possible, thus avoid "root" if there is a more specific language code.
This completely takes the place of the need you see for "mis", *without being unstable*. If I have some Burushaski content, where a code doesn't exist, I tag it with "root". That is valid now, and remains valid forever, even once "brk" is added -- whether "brk" were added because of 639-3 or just because it were added to ISO 639-2.

Mark
On 6/18/07, Peter Constable <petercon at microsoft.com<mailto:petercon at microsoft.com>> wrote:

As far as the JAC is concerned, the intentional semantic of "mis" is what it has always been. As for the extension, when 639-2 was the only alpha-3 code, there was only one context to evaluate the extension that would be derived by that intention; 639-2 did not document the extension, though at least one application of 639-2 - MARC - did. With the introduction of 639-3 and the pending introduction of 639-5 as additions to the alpha-3 space, it becomes clear that the extension must be determined within a context: the cases where you'd want to use "mis" differ if you're using 639-3 rather than 639-2. But for an application of a given part of 639, the change of reference name has had no effect on the extension for that context: the languages encompassed by "mis" in a 639-2 application, for instance, are the same as they were before.



When it comes to BCP 47, the change of reference name for "mis" is basically irrelevant because there is a much bigger issue: in RFC4646bis, BCP 47 will change from being an application of 639-1 and -2 to being an application of 639-1, -2 and -3. That change of context is what creates the issue wrt interoperability of "mis" in applications of BCP 47: Under RFC 4646, Burushaski content would be tagged "mis"; under RFC 4646bis, one would expect new Burushaski content to be tagged "bsk". There's no basis for matching: that's an interop problem. And note that it has nothing to do with stability of "mis" supposedly introduced with the name change: with or without that change, Burushaski content would be tagged differently before and after.



And note that this issue exists whether one considers "old mis" to have the semantic that Keld is stuck on, 'all languages', or the semantic that the JAC has always intended: either way, it is the addition of 639-3 to BCP 47 that creates an issue for uses of "mis" under BCP 47, not the name change.



And even without the addition of 639-3, "mis" would have interop issues: assuming the semantic the JAC has always assumed, the extension in the context of 639-2 could narrow - inherently by the nature of the semantic - any time a new entry was added; but assuming the 'all languages' semantic, one could end up with comparable content tagged in non-comparable ways, "mis" and something else.



Therefore, I suggest that beating up ISO as not being in tune with the needs of the IT community is both fruitless and baseless, and is ignoring the fact that IETF has problems all of its own making. If IETF really wanted to avoid any stability or interop problems related to "mis", it should never have permitted its use in language tags, starting back in RFC 1766, because "mis" has always had stability / interop issues. But that horse is long out of the barn: "mis" *can* be used in language tags under RFCs from 1766 to 4646. The LTRU WG within IETF needs to decide what to do about that in RFC 4646bis. That's a job for IETF; we don't need to continue bothering JAC members with IETF issues.





Peter



From: mark.edward.davis at gmail.com<mailto:mark.edward.davis at gmail.com> [mailto: mark.edward.davis at gmail.com<mailto:mark.edward.davis at gmail.com>] On Behalf Of Mark Davis
Sent: Monday, June 18, 2007 9:23 AM
To: Peter Constable
Cc: Kent Karlsson; Milicent K Wewerka; John Cowan; iso639 at dkuug.dk<mailto:iso639 at dkuug.dk>; ietf-languages at iana.org<mailto:ietf-languages at iana.org>; iso639-2 at loc.gov<mailto:iso639-2 at loc.gov>; isojac at loc.gov<mailto:isojac at loc.gov>; HHj at standard.no<mailto:HHj at standard.no>; LTRU Working Group
Subject: Re: (iso639.2708) RE: ISO 639-2 decision: "mis"



Unfortunately, ISO codes have somewhat of an impedance mismatch with the needs of the IT community; in particular, stability. Thus BCP 47 has to stabilize those codes; one of the main reasons for the existence of RFC 4646. What that means is that if ISO tries to narrow the meaning of *any* code, whether it is a "clarification" or not, we have really only two choices:

1. Keep the broader semantic, which encompasses the new ISO narrow one, or
2. Deprecate the code (in one way or another).

Unlike many other codes, "mis" is one that we can do without, so #2 was a reasonable choice.

What I was trying to come up with language that we could agree on even though we have very different views on the utility and meaning of 'mis'. It sounds like we are ok on the suggested language on the other thread, so I'm hoping that we can put "mis" to bed.

Mark

On 6/16/07, Peter Constable <petercon at microsoft.com <mailto:petercon at microsoft.com> > wrote:

From: Kent Karlsson [mailto: kent.karlsson14 at comhem.se<mailto:kent.karlsson14 at comhem.se>]

> With the "old mis" one could correctly apply 'mis' as a language
> code for any language

That has *never* been the intent of ISO 639. It is an external interpretation, admittedly possible because ISO 639 was not fully explicit up to now. But from the perspective of the JAC, the "new mis" is exactly the same "mis" as the "old mis".


Peter



--
Mark

_______________________________________________
Ltru mailing list
Ltru at ietf.org<mailto:Ltru at ietf.org>
https://www1.ietf.org/mailman/listinfo/ltru



--
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20070618/44b5443e/attachment-0001.html


More information about the Ietf-languages mailing list