"mis" update review request

Sat Apr 14 00:49:22 CEST 2007

From: John Cowan [mailto:cowan at ccil.org]

>> Keep in mind a couple of things: First, this list is defined by MARC,
>> not ISO 639. Secondly, mis was defined in the context of entries
>> included in ISO 639-2; ISO 639-5 will likely introduce new collections,
>> and clearly that has potential impact on how mis might be used.
>
> Indeed.  However, the present concern is with RFC 4646, which allows
> only ISO 639-2 (and 639-1) code elements.

Sure; but we may also want to think about what may happen in the future.

> Nevertheless, my argument stands: it would be inappropriate to use
> "mis" to tag Low Saxon or Tarifit just because your tagging process cannot
> recognize them as such.  In that case, "und" is the appropriate tag.

I think I'd agree. I think there's a question of how to allow for processes of different quality. Do we require that mis must only be used after a correct analysis, or allow it to be used after analyses that may be erroneous? E.g., if a process is not able to recognize some edge-case usage of English and it instead concludes that the content is Romulan, hence there's nothing in the LSTR to support that lang, then is it acceptable to declare mis? (The process may not make the correct analysis, but one might argue that, modulo its abilities, that's the appropriate result.)

Certainly if a process is unable to reach any conclusion -- it didn't complete an analysis or its analysis gave ambiguous results, that's when und would be appropriate.

Peter