<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v = 

"urn:schemas-microsoft-com:vml" xmlns:o = 

"urn:schemas-microsoft-com:office:office" xmlns:w = 

"urn:schemas-microsoft-com:office:word" xmlns:x = 

"urn:schemas-microsoft-com:office:excel" xmlns:p = 

"urn:schemas-microsoft-com:office:powerpoint" xmlns:a = 

"urn:schemas-microsoft-com:office:access" xmlns:dt = 

"uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s = 

"uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs = 

"urn:schemas-microsoft-com:rowset" xmlns:z = "#RowsetSchema" xmlns:b = 

"urn:schemas-microsoft-com:office:publisher" xmlns:ss = 

"urn:schemas-microsoft-com:office:spreadsheet" xmlns:c = 

"urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:oa = 

"urn:schemas-microsoft-com:office:activation" xmlns:html = 

"http://www.w3.org/TR/REC-html40" xmlns:q = 

"http://schemas.xmlsoap.org/soap/envelope/" XMLNS:D = "DAV:" xmlns:x2 = 

"http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ois = 

"http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir = 

"http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds = 

"http://www.w3.org/2000/09/xmldsig#" xmlns:dsp = 

"http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc = 

"http://schemas.microsoft.com/data/udc" xmlns:xsd = 

"http://www.w3.org/2001/XMLSchema" xmlns:sps = 

"http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi = 

"http://www.w3.org/2001/XMLSchema-instance" xmlns:udcxf = 

"http://schemas.microsoft.com/data/udc/xmlfile" xmlns:wf = 

"http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:mver = 

"http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m = 

"http://schemas.microsoft.com/office/2004/12/omml" xmlns:ex12t = 

"http://schemas.microsoft.com/exchange/services/2006/types"><HEAD>

<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">

<META content="MSHTML 6.00.6000.16414" name=GENERATOR><!--[if !mso]>

<STYLE>v\:* {

        BEHAVIOR: url(#default#VML)

}

o\:* {

        BEHAVIOR: url(#default#VML)

}

w\:* {

        BEHAVIOR: url(#default#VML)

}

.shape {

        BEHAVIOR: url(#default#VML)

}

</STYLE>

<![endif]-->

<STYLE>@font-face {

        font-family: Cambria Math;

}

@font-face {

        font-family: Calibri;

}

@font-face {

        font-family: Tahoma;

}

@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.0in 1.0in 1.0in; }

P.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"

}

LI.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"

}

DIV.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"

}

A:link {

        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.MsoHyperlink {

        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99

}

A:visited {

        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.MsoHyperlinkFollowed {

        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.gmailquote {

        mso-style-name: gmail_quote

}

SPAN.EmailStyle18 {

        COLOR: #1f497d; FONT-FAMILY: "Calibri","sans-serif"; mso-style-type: personal

}

SPAN.EmailStyle20 {

        COLOR: #1f497d; FONT-FAMILY: "Calibri","sans-serif"; mso-style-type: personal-reply

}

.MsoChpDefault {

        FONT-SIZE: 10pt; mso-style-type: export-only

}

DIV.Section1 {

        page: Section1

}

OL {

        MARGIN-BOTTOM: 0in

}

UL {

        MARGIN-BOTTOM: 0in

}

</STYLE>

<!--[if gte mso 9]><xml>

 <o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

 <o:shapelayout v:ext="edit">

  <o:idmap v:ext="edit" data="1" />

 </o:shapelayout></xml><![endif]--></HEAD>

<BODY lang=EN-US vLink=purple link=blue>

<DIV><SPAN class=926454607-18042007><FONT face="Arial Unicode MS" color=#0000ff 

size=2>1a: It is just as much a categorical change for 'tai', 'gem', 

etc.&nbsp;But it is a change that I support, since without such a change, almost 

all of the collection codes would have an empty set of applicable languages in 

the context of 639-3. However, if&nbsp;'mis' is deprecated and replaced by a 

code for 'any language' (rather than handling 'mis' like all of the other 

"other" codes) in the process of doing this, I'm not going to complain. To be 

nit-picking, all of the collection codes should then also be replaced (but I'm 

not going to complain if they are not so replaced when removing the "other" part 

of the semantics). Of course, it would be helpful if the standard defining the 

collection codes (639-4?) also gave an explicit hierarchy of the codes (like 

"'tai' covers ..., ..., ...,&nbsp;and any Tai language not given a code"). Maybe 

that is the case already (I haven't seen a draft).</FONT></SPAN></DIV>

<DIV><SPAN class=926454607-18042007><FONT face="Arial Unicode MS" color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=926454607-18042007><FONT face="Arial Unicode MS" color=#0000ff 

size=2>1b:&nbsp;A document (or text fragment) tagged as e.g. 'tai' is supposed 

to be in ONE 'tai' language, not in several 'tai' languages. So in each 

individual application, a collection</FONT></SPAN></DIV>

<DIV><SPAN class=926454607-18042007><FONT face="Arial Unicode MS" color=#0000ff 

size=2>code refers to one language, not several languages, even though the set 

of languages covered by a collection code usually has more than one element (or 

no elements... see 1a).</FONT></SPAN></DIV>

<DIV><SPAN class=926454607-18042007><FONT face="Arial Unicode MS" color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=926454607-18042007><FONT face="Arial Unicode MS" color=#0000ff 

size=2>4: So why is there a code for 'zxx' if it is out of scope? Furthermore, 

'zxx' is supposed to mean "no linguistic content", not "out of 

scope".</FONT></SPAN></DIV>

<DIV><SPAN class=926454607-18042007><FONT face="Arial Unicode MS" color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=926454607-18042007>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

<FONT face="Arial Unicode MS" color=#0000ff size=2>/kent k</FONT></SPAN></DIV>

<DIV><SPAN class=926454607-18042007><FONT face="Arial Unicode MS" color=#0000ff 

size=2></FONT></SPAN>&nbsp;</DIV><BR>

<BLOCKQUOTE dir=ltr 

style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">

  <DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>

  <HR tabIndex=-1>

  <FONT face=Tahoma size=2><B>From:</B> Peter Constable 

  [mailto:petercon@microsoft.com] <BR><B>Sent:</B> Wednesday, April 18, 2007 

  2:33 AM<BR><B>To:</B> ietf-languages@iana.org; 

  ltru@lists.ietf.org<BR><B>Subject:</B> RE: [Ltru] Re: "mis" update review 

  request<BR></FONT><BR></DIV>

  <DIV></DIV>

  <DIV class=Section1>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">On 

  1: I disagree: taking “other” out of mis is a categorical change – it creates 

  a completely different concept, because the heart of the concept of mis is 

  “other”.<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">On 

  1b (“language” vs. “languages”): I disagree: while the content tagged is in a 

  single language, the concept that the ID represents is a collection of 

  languages. The ID represents that concept, not the content; we associate the 

  ID with the content to indicate an association of the concept with the 

  content.<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">On 

  4: Again, I disagree. This is like saying, “It’s out of scope, mostly but not 

  completely.” Either it’s in scope or it’s out of scope.<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Peter<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <DIV>

  <DIV 

  style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; PADDING-LEFT: 0in; PADDING-BOTTOM: 0in; BORDER-LEFT: medium none; PADDING-TOP: 3pt; BORDER-BOTTOM: medium none">

  <P class=MsoNormal><B><SPAN 

  style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'">From:</SPAN></B><SPAN 

  style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'"> Kent Karlsson 

  [mailto:kent.karlsson14@comhem.se] <BR><B>Sent:</B> Tuesday, April 17, 2007 

  12:10 PM<BR><B>To:</B> Peter Constable; ietf-languages@iana.org; 

  ltru@lists.ietf.org<BR><B>Subject:</B> RE: [Ltru] Re: "mis" update review 

  request<o:p></o:p></SPAN></P></DIV></DIV>

  <P class=MsoNormal><o:p>&nbsp;</o:p></P>

  <DIV>

  <P class=MsoNormal><SPAN 

  style="COLOR: blue; FONT-FAMILY: 'Arial','sans-serif'">on 

  1:</SPAN><o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;<o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal><SPAN 

  style="COLOR: blue; FONT-FAMILY: 'Arial','sans-serif'">I don't see why 'mis' 

  would have to be an exception when doing a semantic change of removing 

  (implicit or explicit) "other" for various language codes. Doing so is equally 

  much a semantic change for 'tai' (or any other "other" collection), and of 

  exactly the same kind, so if it is not ok for 'mis' it would not be ok for 

  'tai' either. (If you prefer another acronym, say 'any' instead of 'mis', that 

  is another ball-game.)</SPAN><o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;<o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal><SPAN 

  style="COLOR: blue; FONT-FAMILY: 'Arial','sans-serif'">Furthermore, since 

  'mul' is the only code intended for multiple languages (when it is not 

  practical to list which languages, per fragment of the document preferably), 

  all of the "languages" codes <STRONG><SPAN 

  style="FONT-FAMILY: 'Arial','sans-serif'">should instead refer to 

  "language"&nbsp;in singular</SPAN></STRONG>. This would not be a semantic 

  change, just referring to each of the items that may be tagged, not a set of 

  items [book shelf...] so tagged.</SPAN><o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;<o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal><SPAN 

  style="COLOR: blue; FONT-FAMILY: 'Arial','sans-serif'">on 

  4:</SPAN><o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;<o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal><SPAN 

  style="COLOR: blue; FONT-FAMILY: 'Arial','sans-serif'">Programming languages 

  of various sorts are out of scope (like 'zxx', but unlike 'art'), but I may 

  agree that they are out of scope in a different way than 'zxx'. Perhaps 

  "formal language" ('for'), with no further subdivision (they are still out of 

  scope).</SPAN><o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;<o:p></o:p></P></DIV>

  <DIV>

  <P class=MsoNormal><SPAN 

  style="FONT-FAMILY: 'Arial','sans-serif'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  <SPAN style="COLOR: blue">/kent k</SPAN></SPAN><o:p></o:p></P></DIV>

  <BLOCKQUOTE 

  style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: medium none; PADDING-LEFT: 4pt; PADDING-BOTTOM: 0in; MARGIN: 5pt 0in 5pt 3.75pt; BORDER-LEFT: blue 1.5pt solid; PADDING-TOP: 0in; BORDER-BOTTOM: medium none">

    <P class=MsoNormal><o:p>&nbsp;</o:p></P>

    <DIV class=MsoNormal style="TEXT-ALIGN: center" align=center>

    <HR align=center width="100%" SIZE=2>

    </DIV>

    <P class=MsoNormal style="MARGIN-BOTTOM: 12pt"><B><SPAN 

    style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'">From:</SPAN></B><SPAN 

    style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'"> Peter Constable 

    [mailto:petercon@microsoft.com] <BR><B>Sent:</B> Tuesday, April 17, 2007 

    2:19 AM<BR><B>To:</B> ietf-languages@iana.org; 

    ltru@lists.ietf.org<BR><B>Subject:</B> RE: [Ltru] Re: "mis" update review 

    request</SPAN><o:p></o:p></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Re 

    1: Yes, be careful: (a) the majority of existing legacy usage of mis is 

    bound to be in MARC, and (b) any existing usage would assume the context of 

    ISO 639-2 (i.e. mis in existing usage is the exception list for ISO 

    639-2).<o:p></o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Re 

    2: The mis collection is inherently unstable – unavoidably so. Prior to 

    2005-08-16, an implementation of ISO 639-2 would have tagged Ainu content as 

    mis; after that date, an implementation of ISO 639-2 would have tagged Ainu 

    content as ain; existing content tagged before that date would not get 

    retrieved by request for ain, and it would be conformant to suppose that 

    requests for mis would not return Ainu content. The mis collection is ugly, 

    pure and simple. So, I don’t see what the point is of getting worried over 

    whether we’re making mis unstable: it’s been that way for some 

    time.<o:p></o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">(Note: 

    mis is badly defined from a stability perspective, though I don’t think 

    there’s much question of how it’s defined.)<o:p></o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Re 

    3(b): “</SPAN>There are times when detection can only determine that it 

    looks like there is some linguistic content -- it is not just binary data -- 

    but current detection can't really determine what it might be. That is, a 

    code that means "according to our best available detection methods this 

    doesn't look like it is zxx".<SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">” 

    If you want to use mis for that, I would argue that that is significantly 

    changing the semantics of mis. (Even though mis is unstable, it is unstable 

    on a qualitative level; this is a categorical change.) I definitely oppose 

    that. If you want an ID for “undetermined human language”, then that should 

    be proposed. We should not usurp an existing ID for that 

    purpose.<o:p></o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Re 

    4: I don’t see how your example differs from this: “Nous avons une phrase en 

    français (but this is in English)”. The fact that the parenthetical text is 

    in English doesn’t change the fact that the other text is in French. 

    Similarly, in your example, the fact that there is a comment in English does 

    not change the fact that the rest of the text is not in a human language. Do 

    we create tags for “French with embedded bits of 

    English”?<o:p></o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Peter<o:p></o:p></SPAN></P>

    <P class=MsoNormal><SPAN 

    style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

    <DIV 

    style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; PADDING-LEFT: 0in; PADDING-BOTTOM: 0in; BORDER-LEFT: medium none; PADDING-TOP: 3pt; BORDER-BOTTOM: medium none">

    <P class=MsoNormal><B><SPAN 

    style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'">From:</SPAN></B><SPAN 

    style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'"> 

    mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] <B>On 

    Behalf Of </B>Mark Davis<BR><B>Sent:</B> Monday, April 16, 2007 3:49 

    PM<BR><B>To:</B> Peter Constable<BR><B>Cc:</B> ietf-languages@iana.org; 

    ltru@lists.ietf.org<BR><B>Subject:</B> Re: [Ltru] Re: "mis" update review 

    request<o:p></o:p></SPAN></P></DIV>

    <P class=MsoNormal><o:p>&nbsp;</o:p></P>

    <P class=MsoNormal>1. I think we have to be very careful here. The meaning 

    of a standard like ISO 639-2 is established not by <I>what we wish it would 

    have said, </I>nor by <I>what we would find out if we were able to read 

    Peter's mind.</I> It is established by the wording in the standard, and how 

    reasonable people could interpret it. The fact that "mis" was incorporated 

    in order to account for MARC codes is interesting, but is not in the text of 

    the standard. We can't expect users of BCP 47 to all be able to read Peter's 

    mind before tagging. <BR><BR>2. When we are looking at stability, that is 

    very important: our goal is that once content is correctly tagged, people 

    can depend on the fact that we will not change the meaning of a tag out from 

    under them. So clarifications that we add in future versions of 4646 or the 

    registry are fine, as long as they do not narrow the range of reasonable 

    interpretations. We can broaden them. So in the case of "mis", a proposed 

    narrowing to include just the MARC codes is clearly disallowed, since it was 

    nowhere stated in ISO 639-2 at the time that "mis" was added to the language 

    registry (the BCP 47 semantics are established at the time we add the code). 

    That is one of the key principles of BCP 47, is to isolate us where 

    necessary from instabilities in the source standards. <BR><BR>(The one 

    exception we might be able to make is where something is so badly defined 

    that most reasonable people couldn't come up with any consistent definition 

    for it.)<BR><BR>3. Now, I think there are steps that can be taken to make 

    the above moot. I think Peter's suggestion for ISO 639-X of broadening all 

    of the Collections to remove the (Other) is exactly the right strategy, and 

    if this can be done before 4646bis is issued, all the better. So having 

    <o:p></o:p></P>

    <UL type=disc>

      <LI class=MsoNormal 

      style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1">aus&nbsp;&nbsp;&nbsp; 

      Australian languages means any of the languages on <A 

      href="http://www.ethnologue.com/show_family.asp?subid=90498">http://www.ethnologue.com/show_family.asp?subid=90498</A> 

      <o:p></o:p>

      <LI class=MsoNormal 

      style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1">bat&nbsp;&nbsp;&nbsp; 

      Baltic (Other) =&gt; Baltic languages, means any of the languages on <A 

      href="http://www.ethnologue.com/show_family.asp?subid=90207">http://www.ethnologue.com/show_family.asp?subid=90207</A> 

      <o:p></o:p>

      <LI class=MsoNormal 

      style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1">mis&nbsp;&nbsp;&nbsp; 

      Miscellaneous languages, essentially the root for <A 

      href="http://www.ethnologue.com/family_index.asp">http://www.ethnologue.com/family_index.asp</A> 

      <o:p></o:p></LI></UL>

    <P class=MsoNormal style="MARGIN-BOTTOM: 12pt">and so on. This is useful on 

    a number of levels; it resolves a number of problems in the interpretation 

    of language codes, and makes the source standards themselves more stable. 

    (In the ideal case, we would have codes for each of the possible "decision 

    points" in the language tree. That is, if we look at any language code such 

    as <A 

    href="http://www.ethnologue.com/show_lang_family.asp?code=eng">http://www.ethnologue.com/show_lang_family.asp?code=eng</A> 

    we'd have codes for each of the parent groupings, not just some of them, 

    like "Australian languages".) <BR><BR>3. Randy raised the issue as to 

    whether "mis" in the broad sense is useful (as something that has linguistic 

    content, but I don't know what it is). It very much follows the model in #3. 

    There are times when detection can only determine that it looks like there 

    is some linguistic content -- it is not just binary data -- but current 

    detection can't really determine what it might be. That is, a code that 

    means "according to our best available detection methods this doesn't look 

    like it is zxx". <BR><BR>4. I'm leery of using zxx for programming 

    languages, instead of just binary. There is clearly some linguistic content 

    in "if (content == null) { /* remove the item in the lookup table */ ...}". 

    Maybe we need another code for this, something different than either 'art' 

    or 'zxx'. <BR><BR>Mark<o:p></o:p></P>

    <DIV>

    <P class=MsoNormal><SPAN class=gmailquote>On 4/14/07, <B>Peter Constable</B> 

    &lt;<A href="mailto:petercon@microsoft.com">petercon@microsoft.com</A>&gt; 

    wrote:</SPAN><o:p></o:p></P>

    <P class=MsoNormal>From: Randy Presuhn [mailto:<A 

    href="mailto:randy_presuhn@mindspring.com">randy_presuhn@mindspring.com</A>]<BR><BR><BR>&gt; 

    I find it very hard to believe that a reasonable analysis<BR>&gt; (whether 

    done by human or machine) would classify a text a <BR>&gt; being "mis" 

    without being able to recognize which of the<BR>&gt; languages in that 

    grouping the text belonged to.&nbsp;&nbsp;I can<BR>&gt; believe someone 

    could look at text and say "it's a slavic<BR>&gt; language, but I'm not sure 

    which one."&nbsp;&nbsp;Do we really think <BR>&gt; someone or something 

    would look at some text and say "it's<BR>&gt; Ainu, Andamanese, or Etruscan, 

    but I can't tell which, so<BR>&gt; I'll tag it 'mis'"?<BR><BR>If someone 

    were so tempted, I would argue that would be inappropriate use of mis. Since 

    they do not know what it is, their declaration is that the language identity 

    is not determined, and the appropriate tag for that is und. Appropriate use 

    of mis does not require that one know the language of the content; it does, 

    however, require that one know it is *not* a language covered by any of the 

    available tags. 

    <BR><BR><BR><BR>Peter<BR><BR>_______________________________________________<BR>Ltru 

    mailing list<BR><A href="mailto:Ltru@ietf.org">Ltru@ietf.org</A><BR><A 

    href="https://www1.ietf.org/mailman/listinfo/ltru">https://www1.ietf.org/mailman/listinfo/ltru 

    </A><o:p></o:p></P></DIV>

    <P class=MsoNormal><BR><BR clear=all><BR>-- <BR>Mark 

  <o:p></o:p></P></BLOCKQUOTE></DIV></BLOCKQUOTE></BODY></HTML>