<br><br><div><span class="gmail_quote">On 11/20/06, <b class="gmail_sendername">Kenneth Whistler</b> &lt;<a href="mailto:kenw@sybase.com">kenw@sybase.com</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Mark,<br><br>&gt;<br>&gt; 15924 does not encode just scripts, it also has variants and aliases, such<br>&gt; as:<br>&gt;<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;- Cyrs, Latf, Latg, Hans, Hant, Syre, Syrj, Syrn<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp;- Hrkt, Jpan<br>&gt;<br>&gt; The inclusion of IPA as a variant script of Latin is little different from

<br>&gt; the distinction between Hans and Hant; both are primarily differences in<br>&gt; selection of characters from UCS. The difference between English written in<br>&gt; IPA vs regular Latin characters is certainly on the order of the difference

<br>&gt; between Chinese written in Hans vs Hant, if not more so.<br><br>True, but a specious analogy nonetheless.<br><br>Basically what is going on here is that script codes, because<br>they are available and tied to the language code apparatus, are

<br>being extended to apply to any &quot;significant variation in writing system&quot;<br>that pops up to the level of &quot;we care about the difference for<br>our implementations.&quot;<br><br>Now maybe that is exactly what needs to be done, but in my opinion

<br>the right way to handle this is to first *formally* extend the<br>scope of 15924, so that it no longer is a standard for the<br>registration of script codes, but for script codes *and*<br>selected orthography codes of interest *and* selected variants

<br>of writing systems of interest. At that point the JAC wouldn't<br>have to sit and argue on principle whether some particular<br>oddball request fits or not, and implementers would be freer<br>to ask for stuff that matches distinctions they would like to

<br>make.<br><br>As it is, it is bad enough that we have a &quot;script&quot; registration<br>standard that tries to match up against the &quot;scripts&quot; encoded<br>in Unicode, and has a mostly unexplained hairball of stuff

<br>which can't be matched up, but now requests to register stuff like<br>IPA for a script code keep pushing things further that way.</blockquote><div><br>I don't really see the necessity for a charter change; in particular, I don't see anything in 

<a href="http://www.unicode.org/iso15924/standard/index.html">http://www.unicode.org/iso15924/standard/index.html</a> that would say that Hant is a valid script variant, and IPA is not. Maybe I'm not looking in the right area, so any help would be appreciated.

<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">&gt; It would be of<br>&gt; great benefit to users of IPA to be able to tag data with a variant script

<br>&gt; code, and little pragmatic reason not to allow that, especially in view of<br>&gt; the fact that the standard has already been stretched to include variants<br>&gt; and aliases.<br><br>Dunno what aliases have to do with it, other than to puff up the

<br>argument.</blockquote><div><br>What they have to do with it is that 15924 is already not &quot;pure&quot;: Jpan is not the name of &quot;a&quot; script.<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

And IPA is not a variant script. It is not comparable to Latf and Latg.<br>It is a circumscribed, technical use of Latn. </blockquote><div>&nbsp;<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

&quot;cat&quot; is English.<br>&quot;[cat]&quot; is IPA. Tell me the script difference, except in function.</blockquote><div><br><span>mutatis mutandis: &quot;一二三&quot; is Hant, and </span><span>&quot;一二三&quot; is Hans. </span>

Tell me the script difference, except in function.</div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">So as in the case of Hant versus Hans, registering IPA with a

<br>script code would be another ad hoc extension of 15924 in an<br>orthogonal but basically unexplained direction.</blockquote><div><br>You seem to see this as a slippery slope; take that one drink, and there is an inevitable path to lying in the gutter with a bottle of Ripple in a brown paper bag. I see it as following the precedent already set with Hans/Hant, and providing a reasonable, pragmatic solution for language tags. I don't see anyone wanting to use script codes for smaller orthographic distinctions, such as between &quot;theatre&quot; and &quot;theater&quot;; language tags can already encompass such differences.

<span title="IPA" lang="en"></span></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">The pragmatic reason not to allow that is to prevent 15924 being

<br>used to further muddy all the dimensions of distinctions in<br>writing systems.<br><br>But the pragmatic reason to *allow* it would be to let Google<br>and Microsoft do what they want to do for searches anyway, and<br>

to hell with expecting 15924 to make any sense outside its<br>use as a standard for labeling &quot;written stuff we want to distinguish&quot;.</blockquote><div><br>The political season is over, and I see no need to get into ad hominem attacks. Google has, to my knowledge, no particular stake in this issue -- I don't know that MS does either. This is really just a technical issue of how to best use script values in the language tag mechanism most effectively. Language tags are a big customer for ISO 15924, and it would seem reasonable to at least consider the issue from all sides.

<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">&gt; &gt;&gt; This is not my view only. It was the view of the RA.<br>&gt;<br>

&gt; Regarding the above statement, I also want to add that as far as I can tell,<br>&gt; the 15924 JAC did not consider this topic in any depth, nor does any of the<br>&gt; discussion here seem to be forwarded to the JAC for their consideration; I

<br>&gt; believe that the members are unaware of the issues raised regarding language<br>&gt; tags. As far as I could see from email, the sum total of the discussion was<br>&gt; three statements, two by the same person:<br>

&gt;<br>&gt; A: &quot;As far as I can see, IPA is just a set of Latin characters.&quot;<br>&gt; A: &quot;The IPA is a set of Latin letters, and can be represented by Latn. It is<br>&gt; an orthography of Latin, not a script of its own.&quot;

<br>&gt; C: &quot;I concur with this conclusion.&quot;<br>&gt; [names removed to protect the innocent]<br><br>Yeah, yeah, cute, Mark. Note also that Michael and I, at least,<br>were trained in IPA (and other phonetic orthographies) and

<br>made significant professional use of them. So it isn't as<br>if we are babes in the woods here presented with something<br>we've never heard of before, and are making off-the-cuff, uninformed<br>remarks about.<br><br>

If you feel that a registration for IPA belongs in 15924, then<br>make the case why 15924 should start registering orthographic<br>conventions for the use of a script, instead of just knocking<br>the JAC for &quot;not consider[ing] this topic in any depth,&quot; please.

</blockquote><div><br>I think I might not have been clear. While there may be good reasons for why IPA doesn't qualify, and yet Hans &amp; Hant qualify, but the JAC did not make the rationale clear. While the difference may be blindingly obvious to you, it would be helpful to hear what it actually is, more than an &quot;I concur&quot;.

<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Also, I suggest you consider the distinction between the function<br>of IPA as a bibliographic code and as a &quot;language&quot; code. There

<br>are very, very few books, articles, or anything else that consist<br>exclusively or primarily of IPA used just to represent text. Most<br>of the ones that do exist are experimental failures, basically.<br>It would be very rare that you would need a bibliographic code

<br>for a book *in* IPA, as opposed to a book *about* IPA or including<br>use *of* IPA. On the other hand, it is utterly normal for<br>IPA to be used extensively embedded in the middle of otherwise<br>normal Latin text (or, to be sure, as citations used in the

<br>middle of Cyrillic or Japanese or Chinese or whatever other<br>text). If you embed a bunch of IPA in the middle of otherwise<br>unremarkable Latin text, you really aren't talking about a<br>bibliographic code at all, but tagging runs of text as being

<br>in a special function orthography. If that's what you need to<br>make text searches work right for interpreting such runs of<br>specialized text, then make the case for it.</blockquote><div><br>I don't think anyone is expecting books in IPA -- it would be, as you say, tagged fragments.

<br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">But the fact is that once you get beyond standard writing systems<br>with standardized spellings and start hitting the text corpuses

<br>of specialized languages in specialized orthographies which are<br>increasingly likely to get openly posted on the web, you<br>are going to need a code for *each* orthography in use, per language,<br>to make any sense of the content of those corpuses.

<br><br>Say I were to start posting Chumash language materials on the<br>web in Unicode. (There are a significant number of linguists,<br>Chumash descendants, anthropologists, and just plain Chumash<br>afficionados among the general white population in Santa Barbara

<br>and Ventura counties who would like that, by the way.) To<br>search that material, and just sticking to the Barbareno<br>version of Chumash, you would need at least:<br><br>Chumash-Barbareno in IPA<br>Chumash-Barbareno in JPHarrington orthography (a massive corpus)

<br>Chumash-Barbareno in Americanist orthography<br>Chumash-Barbareno in Applegate practical orthography (used by some<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; anthropologists and a lot of material)<br>Chumash-Barbareno in Whistler practical orthography

<br>Chumash-Barbareno in Chumash nation orthography<br><br>Because texts are spelled systematically differently in each of<br>those systems and use somewhat different repertoires of characters.<br><br>So make your case why IPA is special. (For Chumash, it would,

<br>for example, be of very little real value, because very little<br>of the Chumash data is represented directly in IPA.) Where do<br>you draw the line in registering these thing?<br><br>Or do you think registering IPA just solves some problem that

<br>won't come around again for the next technical orthography<br>that comes down the pike?</blockquote><div><br>No, I think everyone is aware that there are multiple systems of phonetic representations. So IPA would likely be the first of several. But it is clearly a relatively important one, being in pretty widespread use in dictionaries and other sources. As with encoding characters, or script variants, at some point you have to make judgments as to whether a system is in wide enough usage to be worth encoding; systems that are in ad hoc, limited use wouldn't qualify.

<br><br>But a side note: a list of 8 different potential systems is not exactly scary, given we that we are at the point of adding some 7,000 new language tags. <br></div><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

&gt;<br>&gt; Morever, I want to point out that the RA and the JAC are two different<br>&gt; entities, and that this view does not represent the view of the RA (which<br>&gt; has not taken a position on the issue).<br><br>

Yep. I agree with that.<br><br>--Ken<br><br>&gt;<br>&gt; Mark<br><br><br></blockquote></div><br>