[Ltru] RE: (iso639.2708) RE: ISO 639-2 decision: "mis"

Mark Davis mark.davis at icu-project.org
Tue Jun 19 21:16:25 CEST 2007


"root" would be just as useful as "mis", if you stipulation that people
should tag with as much information as they have. The practical effect would
be little different except that "root" would remain valid.

But the fundamental problem with "mis" is that it is unstable, and
ill-defined. If I get "mis" in a year from now I have no idea whether the
original item was tagged when "mis" first came out (1998?), or in 4646, or
in 4646bis; I just don't know and can't know. If you had defined "mis" to be
"whatever didn't have a code in 1998", that would at least be stable, and
allow valid tags to remain valid.

And frankly, if you are going to the trouble to tag content that you're
going to later revisit, you'd be far better off to use a unique private use
code for each missing item. That way you don't have to re-analyse each piece
of content having "mis" on it.

Anyway, it sounds like the last language proposed for "mis" is ok with
everyone, so this discussion is really moot. Sorry to have raised your
hackles with my original message.

Mark

On 6/18/07, Peter Constable <petercon at microsoft.com> wrote:
>
>  Suggesting that 'root' avoids problems is, IMO, rather a bit of false
> economy. Sure, if a tag means 'some language', then content so tagged never
> becomes **incorrectly** tagged when an ID for the given language is added.
> But let's consider whether it is **usefully** tagged: changing things so
> that it could continue to be **correctly** tagged wouldn't make it more **
> usefully** tagged, and arguably makes it less so.
>
>
>
> To suggest that users needn't worry about their content tagged 'root'
> after a new language is coded is IMO bad advice. They certainly **should**
> worry if they want their data to be useful: they'll want to re-tag the
> relevant content with the newly-coded ID, else they end up with data that
> won't compare. At least if they know that the addition will narrow the
> extension and potentially invalidate tagging on some of their content,
> they're more likely to pay attention; what you suggest can give the
> impression that they don't have any particular worry, which is not the case.
>
>
>
>
> A tag with the 'root' semantic can always mean anything – which means it's
> nearly void of meaning and is about as useful as not having tagged it at all
> in the first place. (The 'root' semantic would be equivalent to 'not zxx'.)
> That's not **usefully** tagged, but it's vacuously always going to be
> validly tagged – big deal. At least with the 'uncoded' semantic they can use
> change history for the code table and the record date to derive a short list
> of what languages "mis" content might be in – a pain, but that's actually
> more useful that 'some language'.
>
>
>
>
>
> Peter
>
>
>
> *From:* mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] *On
> Behalf Of *Mark Davis
> *Sent:* Monday, June 18, 2007 1:49 PM
> *To:* Peter Constable
> *Cc:* LTRU Working Group; ietf-languages at iana.org; iso639-2 at loc.gov;
> isojac at loc.gov; iso639 at dkuug.dk
> *Subject:* Re: [Ltru] RE: (iso639.2708) RE: ISO 639-2 decision: "mis"
>
>
>
> I really didn't want to start a flame about this; I'm sorry if what I said
> was be considered incendiary.
>
> This whole issue is not really connected with the change from part 2 to
> part 3, at all. Take your example: it is a problem with your definition of
> "mis" in BCP 47 whether "brk" were added because of 639-3 OR just because it
> were added to ISO 639-2! It is an issue whenever new codes could be added
> that would invalidate previous usage of "mis".
>
> And the sad thing is that this instability in ISO codes is completely
> avoidable. There is a perfectly good way to have the same functionality
> *without* being unstable.
>
>    - Have a code I'll call here "root" (to avoid any misunderstanding
>    about the meaning of "mis".)
>    - Have it be valid to tag any language content with "root".
>    - State that one SHOULD tag as narrowly as possible, thus avoid
>    "root" if there is a more specific language code.
>
> This completely takes the place of the need you see for "mis", *without
> being unstable*. If I have some Burushaski content, where a code doesn't
> exist, I tag it with "root". That is valid now, and remains valid forever,
> even once "brk" is added -- whether "brk" were added because of 639-3 or
> just because it were added to ISO 639-2.
>
> Mark
>
> On 6/18/07, *Peter Constable* <petercon at microsoft.com> wrote:
>
> As far as the JAC is concerned, the intentional semantic of "mis" is what
> it has always been. As for the extension, when 639-2 was the only alpha-3
> code, there was only one context to evaluate the extension that would be
> derived by that intention; 639-2 did not document the extension, though at
> least one application of 639-2 – MARC – did. With the introduction of 639-3
> and the pending introduction of 639-5 as additions to the alpha-3 space, it
> becomes clear that the extension must be determined within a context: the
> cases where you'd want to use "mis" differ if you're using 639-3 rather than
> 639-2. But for an application of a given part of 639, the change of
> reference name has had no effect on the extension for that context: the
> languages encompassed by "mis" in a 639-2 application, for instance, are the
> same as they were before.
>
>
>
> When it comes to BCP 47, the change of reference name for "mis" is
> basically irrelevant because there is a much bigger issue: in RFC4646bis,
> BCP 47 will change from being an application of 639-1 and -2 to being an
> application of 639-1, -2 and -3. That change of context is what creates the
> issue wrt interoperability of "mis" in applications of BCP 47: Under RFC
> 4646, Burushaski content would be tagged "mis"; under RFC 4646bis, one would
> expect new Burushaski content to be tagged "bsk". There's no basis for
> matching: that's an interop problem. And note that it has nothing to do with
> stability of "mis" supposedly introduced with the name change: with or
> without that change, Burushaski content would be tagged differently before
> and after.
>
>
>
> And note that this issue exists whether one considers "old mis" to have
> the semantic that Keld is stuck on, 'all languages', or the semantic that
> the JAC has always intended: either way, it is the addition of 639-3 to BCP
> 47 that creates an issue for uses of "mis" under BCP 47, not the name
> change.
>
>
>
> And even without the addition of 639-3, "mis" would have interop issues:
> assuming the semantic the JAC has always assumed, the extension in the
> context of 639-2 could narrow – inherently by the nature of the semantic –
> any time a new entry was added; but assuming the 'all languages' semantic,
> one could end up with comparable content tagged in non-comparable ways,
> "mis" and something else.
>
>
>
> Therefore, I suggest that beating up ISO as not being in tune with the
> needs of the IT community is both fruitless and baseless, and is ignoring
> the fact that IETF has problems all of its own making. If IETF really wanted
> to avoid any stability or interop problems related to "mis", it should never
> have permitted its use in language tags, starting back in RFC 1766, because
> "mis" has always had stability / interop issues. But that horse is long out
> of the barn: "mis" **can** be used in language tags under RFCs from 1766
> to 4646. The LTRU WG within IETF needs to decide what to do about that in
> RFC 4646bis. That's a job for IETF; we don't need to continue bothering JAC
> members with IETF issues.
>
>
>
>
>
> Peter
>
>
>
> *From:* mark.edward.davis at gmail.com [mailto: mark.edward.davis at gmail.com]
> *On Behalf Of *Mark Davis
> *Sent:* Monday, June 18, 2007 9:23 AM
> *To:* Peter Constable
> *Cc:* Kent Karlsson; Milicent K Wewerka; John Cowan; iso639 at dkuug.dk;
> ietf-languages at iana.org; iso639-2 at loc.gov; isojac at loc.gov; HHj at standard.no;
> LTRU Working Group
> *Subject:* Re: (iso639.2708) RE: ISO 639-2 decision: "mis"
>
>
>
> Unfortunately, ISO codes have somewhat of an impedance mismatch with the
> needs of the IT community; in particular, stability. Thus BCP 47 has to
> stabilize those codes; one of the main reasons for the existence of RFC
> 4646. What that means is that if ISO tries to narrow the meaning of *any*
> code, whether it is a "clarification" or not, we have really only two
> choices:
>
> 1. Keep the broader semantic, which encompasses the new ISO narrow one, or
> 2. Deprecate the code (in one way or another).
>
> Unlike many other codes, "mis" is one that we can do without, so #2 was a
> reasonable choice.
>
> What I was trying to come up with language that we could agree on even
> though we have very different views on the utility and meaning of 'mis'. It
> sounds like we are ok on the suggested language on the other thread, so I'm
> hoping that we can put "mis" to bed.
>
> Mark
>
> On 6/16/07, *Peter Constable* <petercon at microsoft.com > wrote:
>
> From: Kent Karlsson [mailto: kent.karlsson14 at comhem.se]
>
> > With the "old mis" one could correctly apply 'mis' as a language
> > code for any language
>
> That has *never* been the intent of ISO 639. It is an external
> interpretation, admittedly possible because ISO 639 was not fully explicit
> up to now. But from the perspective of the JAC, the "new mis" is exactly the
> same "mis" as the "old mis".
>
>
> Peter
>
>
>
>
> --
> Mark
>
>
> _______________________________________________
> Ltru mailing list
> Ltru at ietf.org
> https://www1.ietf.org/mailman/listinfo/ltru
>
>
>
>
> --
> Mark
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20070619/8a0e7520/attachment-0001.html


More information about the Ietf-languages mailing list