<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v = 

"urn:schemas-microsoft-com:vml" xmlns:o = 

"urn:schemas-microsoft-com:office:office" xmlns:w = 

"urn:schemas-microsoft-com:office:word" xmlns:x = 

"urn:schemas-microsoft-com:office:excel" xmlns:p = 

"urn:schemas-microsoft-com:office:powerpoint" xmlns:a = 

"urn:schemas-microsoft-com:office:access" xmlns:dt = 

"uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s = 

"uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs = 

"urn:schemas-microsoft-com:rowset" xmlns:z = "#RowsetSchema" xmlns:b = 

"urn:schemas-microsoft-com:office:publisher" xmlns:ss = 

"urn:schemas-microsoft-com:office:spreadsheet" xmlns:c = 

"urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:oa = 

"urn:schemas-microsoft-com:office:activation" xmlns:html = 

"http://www.w3.org/TR/REC-html40" xmlns:q = 

"http://schemas.xmlsoap.org/soap/envelope/" XMLNS:D = "DAV:" xmlns:x2 = 

"http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ois = 

"http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir = 

"http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds = 

"http://www.w3.org/2000/09/xmldsig#" xmlns:dsp = 

"http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc = 

"http://schemas.microsoft.com/data/udc" xmlns:xsd = 

"http://www.w3.org/2001/XMLSchema" xmlns:sps = 

"http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi = 

"http://www.w3.org/2001/XMLSchema-instance" xmlns:udcxf = 

"http://schemas.microsoft.com/data/udc/xmlfile" xmlns:wf = 

"http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:mver = 

"http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m = 

"http://schemas.microsoft.com/office/2004/12/omml" xmlns:ex12t = 

"http://schemas.microsoft.com/exchange/services/2006/types"><HEAD>

<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">

<META content="MSHTML 6.00.6000.16414" name=GENERATOR>

<STYLE>@font-face {

        font-family: Wingdings;

}

@font-face {

        font-family: SimSun;

}

@font-face {

        font-family: Cambria Math;

}

@font-face {

        font-family: Calibri;

}

@font-face {

        font-family: Tahoma;

}

@font-face {

        font-family: @SimSun;

}

@page Section1 {size: 8.5in 11.0in; margin: 1.0in 1.0in 1.0in 1.0in; }

P.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"

}

LI.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"

}

DIV.MsoNormal {

        FONT-SIZE: 12pt; MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"

}

A:link {

        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.MsoHyperlink {

        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99

}

A:visited {

        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.MsoHyperlinkFollowed {

        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.gmailquote {

        mso-style-name: gmail_quote

}

SPAN.EmailStyle18 {

        COLOR: #1f497d; FONT-FAMILY: "Calibri","sans-serif"; mso-style-type: personal-reply

}

.MsoChpDefault {

        mso-style-type: export-only

}

DIV.Section1 {

        page: Section1

}

OL {

        MARGIN-BOTTOM: 0in

}

UL {

        MARGIN-BOTTOM: 0in

}

</STYLE>

<!--[if gte mso 9]><xml>

 <o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

 <o:shapelayout v:ext="edit">

  <o:idmap v:ext="edit" data="1" />

 </o:shapelayout></xml><![endif]--></HEAD>

<BODY lang=EN-US vLink=purple link=blue>

<DIV><SPAN class=103260511-17042007><FONT face=Arial color=#0000ff>on 

1:</FONT></SPAN></DIV>

<DIV><SPAN class=103260511-17042007><FONT face=Arial 

color=#0000ff></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=103260511-17042007><FONT face=Arial color=#0000ff>I don't see 

why 'mis' would have to be an exception when doing a semantic change of removing 

(implicit or explicit) "other" for various language codes. Doing so is equally 

much a semantic change for 'tai' (or any other "other" collection), and of 

exactly the same kind, so if it is not ok for 'mis' it would not be ok for 'tai' 

either. (If you prefer another acronym, say 'any' instead of 'mis', that is 

another ball-game.)</FONT></SPAN></DIV>

<DIV><SPAN class=103260511-17042007><FONT face=Arial 

color=#0000ff></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=103260511-17042007><FONT face=Arial color=#0000ff>Furthermore, 

since 'mul' is the only code intended for multiple languages (when it is not 

practical to list which languages, per fragment of the document preferably), all 

of the "languages" codes <STRONG>should instead refer to "language"&nbsp;in 

singular</STRONG>. This would not be a semantic change, just referring to each 

of the items that may be tagged, not a set of items [book shelf...] so 

tagged.</FONT></SPAN></DIV>

<DIV><SPAN class=103260511-17042007><FONT face=Arial 

color=#0000ff></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=103260511-17042007><FONT face=Arial color=#0000ff>on 

4:</FONT></SPAN></DIV>

<DIV><SPAN class=103260511-17042007><FONT face=Arial 

color=#0000ff></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=103260511-17042007><FONT face=Arial color=#0000ff>Programming 

languages of various sorts are out of scope (like 'zxx', but unlike 'art'), but 

I may agree that they are out of scope in a different way than 'zxx'. Perhaps 

"formal language" ('for'), with no further subdivision (they are still out of 

scope).</FONT></SPAN></DIV>

<DIV><SPAN class=103260511-17042007><FONT face=Arial 

color=#0000ff></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=103260511-17042007><FONT 

face=Arial>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT color=#0000ff>/kent 

k</FONT></FONT></SPAN></DIV><BR>

<BLOCKQUOTE dir=ltr 

style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px solid; MARGIN-RIGHT: 0px">

  <DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>

  <HR tabIndex=-1>

  <FONT face=Tahoma size=2><B>From:</B> Peter Constable 

  [mailto:petercon@microsoft.com] <BR><B>Sent:</B> Tuesday, April 17, 2007 2:19 

  AM<BR><B>To:</B> ietf-languages@iana.org; 

  ltru@lists.ietf.org<BR><B>Subject:</B> RE: [Ltru] Re: "mis" update review 

  request<BR></FONT><BR></DIV>

  <DIV></DIV>

  <DIV class=Section1>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Re 

  1: Yes, be careful: (a) the majority of existing legacy usage of mis is bound 

  to be in MARC, and (b) any existing usage would assume the context of ISO 

  639-2 (i.e. mis in existing usage is the exception list for ISO 

  639-2).<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Re 

  2: The mis collection is inherently unstable – unavoidably so. Prior to 

  2005-08-16, an implementation of ISO 639-2 would have tagged Ainu content as 

  mis; after that date, an implementation of ISO 639-2 would have tagged Ainu 

  content as ain; existing content tagged before that date would not get 

  retrieved by request for ain, and it would be conformant to suppose that 

  requests for mis would not return Ainu content. The mis collection is ugly, 

  pure and simple. So, I don’t see what the point is of getting worried over 

  whether we’re making mis unstable: it’s been that way for some 

  time.<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">(Note: 

  mis is badly defined from a stability perspective, though I don’t think 

  there’s much question of how it’s defined.)<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Re 

  3(b): “</SPAN>There are times when detection can only determine that it looks 

  like there is some linguistic content -- it is not just binary data -- but 

  current detection can't really determine what it might be. That is, a code 

  that means "according to our best available detection methods this doesn't 

  look like it is zxx".<SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">” 

  If you want to use mis for that, I would argue that that is significantly 

  changing the semantics of mis. (Even though mis is unstable, it is unstable on 

  a qualitative level; this is a categorical change.) I definitely oppose that. 

  If you want an ID for “undetermined human language”, then that should be 

  proposed. We should not usurp an existing ID for that 

  purpose.<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Re 

  4: I don’t see how your example differs from this: “Nous avons une phrase en 

  français (but this is in English)”. The fact that the parenthetical text is in 

  English doesn’t change the fact that the other text is in French. Similarly, 

  in your example, the fact that there is a comment in English does not change 

  the fact that the rest of the text is not in a human language. Do we create 

  tags for “French with embedded bits of English”?<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Peter<o:p></o:p></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><o:p>&nbsp;</o:p></SPAN></P>

  <DIV 

  style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; PADDING-LEFT: 0in; PADDING-BOTTOM: 0in; BORDER-LEFT: medium none; PADDING-TOP: 3pt; BORDER-BOTTOM: medium none">

  <P class=MsoNormal><B><SPAN 

  style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'">From:</SPAN></B><SPAN 

  style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'"> 

  mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] <B>On Behalf 

  Of </B>Mark Davis<BR><B>Sent:</B> Monday, April 16, 2007 3:49 PM<BR><B>To:</B> 

  Peter Constable<BR><B>Cc:</B> ietf-languages@iana.org; 

  ltru@lists.ietf.org<BR><B>Subject:</B> Re: [Ltru] Re: "mis" update review 

  request<o:p></o:p></SPAN></P></DIV>

  <P class=MsoNormal><o:p>&nbsp;</o:p></P>

  <P class=MsoNormal>1. I think we have to be very careful here. The meaning of 

  a standard like ISO 639-2 is established not by <I>what we wish it would have 

  said, </I>nor by <I>what we would find out if we were able to read Peter's 

  mind.</I> It is established by the wording in the standard, and how reasonable 

  people could interpret it. The fact that "mis" was incorporated in order to 

  account for MARC codes is interesting, but is not in the text of the standard. 

  We can't expect users of BCP 47 to all be able to read Peter's mind before 

  tagging. <BR><BR>2. When we are looking at stability, that is very important: 

  our goal is that once content is correctly tagged, people can depend on the 

  fact that we will not change the meaning of a tag out from under them. So 

  clarifications that we add in future versions of 4646 or the registry are 

  fine, as long as they do not narrow the range of reasonable interpretations. 

  We can broaden them. So in the case of "mis", a proposed narrowing to include 

  just the MARC codes is clearly disallowed, since it was nowhere stated in ISO 

  639-2 at the time that "mis" was added to the language registry (the BCP 47 

  semantics are established at the time we add the code). That is one of the key 

  principles of BCP 47, is to isolate us where necessary from instabilities in 

  the source standards. <BR><BR>(The one exception we might be able to make is 

  where something is so badly defined that most reasonable people couldn't come 

  up with any consistent definition for it.)<BR><BR>3. Now, I think there are 

  steps that can be taken to make the above moot. I think Peter's suggestion for 

  ISO 639-X of broadening all of the Collections to remove the (Other) is 

  exactly the right strategy, and if this can be done before 4646bis is issued, 

  all the better. So having <o:p></o:p></P>

  <UL type=disc>

    <LI class=MsoNormal 

    style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1">aus&nbsp;&nbsp;&nbsp; 

    Australian languages means any of the languages on <A 

    href="http://www.ethnologue.com/show_family.asp?subid=90498">http://www.ethnologue.com/show_family.asp?subid=90498</A><o:p></o:p> 


    <LI class=MsoNormal 

    style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1">bat&nbsp;&nbsp;&nbsp; 

    Baltic (Other) =&gt; Baltic languages, means any of the languages on <A 

    href="http://www.ethnologue.com/show_family.asp?subid=90207">http://www.ethnologue.com/show_family.asp?subid=90207</A><o:p></o:p> 


    <LI class=MsoNormal 

    style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; mso-list: l0 level1 lfo1">mis&nbsp;&nbsp;&nbsp; 

    Miscellaneous languages, essentially the root for <A 

    href="http://www.ethnologue.com/family_index.asp">http://www.ethnologue.com/family_index.asp</A><o:p></o:p> 

    </LI></UL>

  <P class=MsoNormal style="MARGIN-BOTTOM: 12pt">and so on. This is useful on a 

  number of levels; it resolves a number of problems in the interpretation of 

  language codes, and makes the source standards themselves more stable. (In the 

  ideal case, we would have codes for each of the possible "decision points" in 

  the language tree. That is, if we look at any language code such as <A 

  href="http://www.ethnologue.com/show_lang_family.asp?code=eng">http://www.ethnologue.com/show_lang_family.asp?code=eng</A> 

  we'd have codes for each of the parent groupings, not just some of them, like 

  "Australian languages".) <BR><BR>3. Randy raised the issue as to whether "mis" 

  in the broad sense is useful (as something that has linguistic content, but I 

  don't know what it is). It very much follows the model in #3. There are times 

  when detection can only determine that it looks like there is some linguistic 

  content -- it is not just binary data -- but current detection can't really 

  determine what it might be. That is, a code that means "according to our best 

  available detection methods this doesn't look like it is zxx". <BR><BR>4. I'm 

  leery of using zxx for programming languages, instead of just binary. There is 

  clearly some linguistic content in "if (content == null) { /* remove the item 

  in the lookup table */ ...}". Maybe we need another code for this, something 

  different than either 'art' or 'zxx'. <BR><BR>Mark<o:p></o:p></P>

  <DIV>

  <P class=MsoNormal><SPAN class=gmailquote>On 4/14/07, <B>Peter Constable</B> 

  &lt;<A href="mailto:petercon@microsoft.com">petercon@microsoft.com</A>&gt; 

  wrote:</SPAN><o:p></o:p></P>

  <P class=MsoNormal>From: Randy Presuhn [mailto:<A 

  href="mailto:randy_presuhn@mindspring.com">randy_presuhn@mindspring.com</A>]<BR><BR><BR>&gt; 

  I find it very hard to believe that a reasonable analysis<BR>&gt; (whether 

  done by human or machine) would classify a text a <BR>&gt; being "mis" without 

  being able to recognize which of the<BR>&gt; languages in that grouping the 

  text belonged to.&nbsp;&nbsp;I can<BR>&gt; believe someone could look at text 

  and say "it's a slavic<BR>&gt; language, but I'm not sure which 

  one."&nbsp;&nbsp;Do we really think <BR>&gt; someone or something would look 

  at some text and say "it's<BR>&gt; Ainu, Andamanese, or Etruscan, but I can't 

  tell which, so<BR>&gt; I'll tag it 'mis'"?<BR><BR>If someone were so tempted, 

  I would argue that would be inappropriate use of mis. Since they do not know 

  what it is, their declaration is that the language identity is not determined, 

  and the appropriate tag for that is und. Appropriate use of mis does not 

  require that one know the language of the content; it does, however, require 

  that one know it is *not* a language covered by any of the available tags. 

  <BR><BR><BR><BR>Peter<BR><BR>_______________________________________________<BR>Ltru 

  mailing list<BR><A href="mailto:Ltru@ietf.org">Ltru@ietf.org</A><BR><A 

  href="https://www1.ietf.org/mailman/listinfo/ltru">https://www1.ietf.org/mailman/listinfo/ltru 

  </A><o:p></o:p></P></DIV>

  <P class=MsoNormal><BR><BR clear=all><BR>-- <BR>Mark 

<o:p></o:p></P></DIV></BLOCKQUOTE></BODY></HTML>