[Ltru] Re: "mis" update review request

Peter Constable petercon at microsoft.com
Sat Apr 14 03:22:03 CEST 2007

(Is "kind" meant to be a word in English, a word in some other language, or a reference to something else? I assume the first.)

I don't think we can say it is *non-conformant* to tag "hello" as mis any more than we can say it is non-conformant to tag "hello" as fr. It's just bad tagging. That's comparable to a spelling checker correcting "helo" as "holp" - it may not be useful, that that doesn't make it non-conformant to Unicode. Non-conformance to BCP 47 would be to see fr and apply an English spelling checker.

But that's a different question from how (or if) people should use mis. IMO we should say that implementers of BCP 47 SHOULD NOT use mis except if they have an immediate need to apply a language subtag and have determined that there is no available language subtag encompassing the language of the given content. E.g. I know it's Martian and there's no ISO 639 ID for Martian. (Hypothetical example assumes Martians exist and that this is not encompassed within art.) Of course, in many cases people/processes may not know enough about every language to be able to rule out all the available possibilities, but in that case the appropriate thing to use would be und.


From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis
Sent: Friday, April 13, 2007 5:20 PM
To: Peter Constable
Cc: LTRU Working Group; ietf-languages at alvestrand.no
Subject: Re: [Ltru] Re: "mis" update review request

That, I think, we are all in agreement on. And that follows what we do in BCP 47, which is that we say that people *should* tag as specifically as possible. So if I know that content is "en-US", I *should* say "en-US" and not just "en". But I *can* also use "en". It might not be the best choice, but it is a legitimate usage (although not optimal) usage. However, it is a perfectly reasonable choice if I don't know whether it is "en-US" or "en-CA", or it could be both.

So what about "mis"? Once again, I *should* tag more specifically, if I have the information. No argument at all there. The question is whether it is non-conformant to BCP 47 to tag "kind" as "mis". For that, we need to establish whether there is sufficient grounds in the text and data of ISO 639-2 as of the time that "mis" was taken into BCP 47 to conclusively determine that "mis" is disjoint from other language codes. I don't see a conclusive case from what you and John have said so far, unless I'm missing something.

I would not at all be adverse to saying that you shouldn't use "und" or "mis" or "mul" or any collections if you have any more specific information about the content. And I think it is clear that we need much more guidance in BCP 47 as to intended usage.

On 4/13/07, Peter Constable <petercon at microsoft.com<mailto:petercon at microsoft.com>> wrote:
I think ISO 639-2 is clear that the most specific category should be used. That principle is implicit in the "(Other)" collections. I also think that principle in combination with collections creates maintenance problems. (Which is why I suggested that all the "(Other)" entries should just be "languages" entries.)

(Btw, I think I suggested some time ago it might not be a bad thing to deprecate use of collection IDs in IETF language tags.)


-----Original Message-----
From: John Cowan [mailto: cowan at ccil.org<mailto:cowan at ccil.org>]
Sent: Friday, April 13, 2007 2:07 PM
To: Mark Davis
Cc: LTRU Working Group; ietf-languages at alvestrand.no<mailto:ietf-languages at alvestrand.no>
Subject: [Ltru] Re: "mis" update review request

Mark Davis scripsit:

> You might like this to be true, but I don't see any substantiation of
> it in the standard. If you could point me to that, I'd appreciate it.

It seems rather self-evident to me that 'ber' is a subset of 'afa',
and so on; but no, the standard doesn't say so.  It does, however, say:

        A collective language code is not intended to be used when an
        individual language code or another more specific collective
        language code is available.

http://www.loc.gov/standards/iso639-2/normtext.html section 4.1.1

I take that to mean that "afa" is unsuitable for a Berber language, and
"ger" is unsuitable for English.  A fortiori, "mis" is unsuitable for
a language for which a better code is available.

Questionless, this contradicts the desire for stability, but I don't
see what's to be done about it.  I tried at one point to get all
language collection codes deprecated, but it was pointed out that
there are good reasons for having them, as when insufficient evidence
is available.

John Cowan   cowan at ccil.org<mailto:cowan at ccil.org>   http://ccil.org/~cowan
I must confess that I have very little notion of what [s. 4 of the British
Trade Marks Act, 1938] is intended to convey, and particularly the sentence
of 253 words, as I make them, which constitutes sub-section 1.  I doubt if
the entire statute book could be successfully searched for a sentence of
equal length which is of more fuliginous obscurity. --MacKinnon LJ, 1940

Ltru mailing list
Ltru at ietf.org<mailto:Ltru at ietf.org>

Ltru mailing list
Ltru at ietf.org<mailto:Ltru at ietf.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20070413/a8b1f7a5/attachment-0001.html

More information about the Ietf-languages mailing list