[iso15924-jac] Re: Phonetic orthographies

Tue Nov 21 11:57:50 CET 2006

Kenneth wrote:

> But the fact is that once you get beyond standard writing 
> systems with standardized spellings and start hitting the 
> text corpuses of specialized languages in specialized 
> orthographies which are increasingly likely to get openly 
> posted on the web, you are going to need a code for *each* 
> orthography in use, per language, to make any sense of the 
> content of those corpuses.

Absolutely true. This is what the written mode within ISO 639-6 attempts to
provide.

Best regards

Debbie

> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no 
> [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of 
> Kenneth Whistler
> Sent: 21 November 2006 01:51
> To: mark.davis at icu-project.org
> Cc: ietf-languages at iana.org; iso15924-jac at unicode.org; kenw at sybase.com
> Subject: Re: [iso15924-jac] Re: Phonetic orthographies
> 
> Mark,
> 
> > 
> > 15924 does not encode just scripts, it also has variants 
> and aliases, 
> > such
> > as:
> > 
> >    - Cyrs, Latf, Latg, Hans, Hant, Syre, Syrj, Syrn
> >    - Hrkt, Jpan
> > 
> > The inclusion of IPA as a variant script of Latin is little 
> different 
> > from the distinction between Hans and Hant; both are primarily 
> > differences in selection of characters from UCS. The difference 
> > between English written in IPA vs regular Latin characters is 
> > certainly on the order of the difference between Chinese 
> written in Hans vs Hant, if not more so.
> 
> True, but a specious analogy nonetheless.
> 
> Basically what is going on here is that script codes, because 
> they are available and tied to the language code apparatus, 
> are being extended to apply to any "significant variation in 
> writing system"
> that pops up to the level of "we care about the difference 
> for our implementations."
> 
> Now maybe that is exactly what needs to be done, but in my 
> opinion the right way to handle this is to first *formally* 
> extend the scope of 15924, so that it no longer is a standard 
> for the registration of script codes, but for script codes 
> *and* selected orthography codes of interest *and* selected 
> variants of writing systems of interest. At that point the 
> JAC wouldn't have to sit and argue on principle whether some 
> particular oddball request fits or not, and implementers 
> would be freer to ask for stuff that matches distinctions 
> they would like to make.
> 
> As it is, it is bad enough that we have a "script" 
> registration standard that tries to match up against the 
> "scripts" encoded in Unicode, and has a mostly unexplained 
> hairball of stuff which can't be matched up, but now requests 
> to register stuff like IPA for a script code keep pushing 
> things further that way.
> 
> > It would be of
> > great benefit to users of IPA to be able to tag data with a variant 
> > script code, and little pragmatic reason not to allow that, 
> especially 
> > in view of the fact that the standard has already been stretched to 
> > include variants and aliases.
> 
> Dunno what aliases have to do with it, other than to puff up 
> the argument.
> 
> And IPA is not a variant script. It is not comparable to Latf 
> and Latg.
> It is a circumscribed, technical use of Latn. "cat" is English.
> "[cat]" is IPA. Tell me the script difference, except in function.
> 
> So as in the case of Hant versus Hans, registering IPA with a 
> script code would be another ad hoc extension of 15924 in an 
> orthogonal but basically unexplained direction.
> 
> The pragmatic reason not to allow that is to prevent 15924 
> being used to further muddy all the dimensions of 
> distinctions in writing systems.
> 
> But the pragmatic reason to *allow* it would be to let Google 
> and Microsoft do what they want to do for searches anyway, 
> and to hell with expecting 15924 to make any sense outside 
> its use as a standard for labeling "written stuff we want to 
> distinguish".
> 
> 
> > >> This is not my view only. It was the view of the RA.
> > 
> > Regarding the above statement, I also want to add that as 
> far as I can 
> > tell, the 15924 JAC did not consider this topic in any 
> depth, nor does 
> > any of the discussion here seem to be forwarded to the JAC 
> for their 
> > consideration; I believe that the members are unaware of the issues 
> > raised regarding language tags. As far as I could see from 
> email, the 
> > sum total of the discussion was three statements, two by 
> the same person:
> > 
> > A: "As far as I can see, IPA is just a set of Latin characters."
> > A: "The IPA is a set of Latin letters, and can be 
> represented by Latn. 
> > It is an orthography of Latin, not a script of its own."
> > C: "I concur with this conclusion."
> > [names removed to protect the innocent]
> 
> Yeah, yeah, cute, Mark. Note also that Michael and I, at 
> least, were trained in IPA (and other phonetic orthographies) 
> and made significant professional use of them. So it isn't as 
> if we are babes in the woods here presented with something 
> we've never heard of before, and are making off-the-cuff, 
> uninformed remarks about.
> 
> If you feel that a registration for IPA belongs in 15924, 
> then make the case why 15924 should start registering 
> orthographic conventions for the use of a script, instead of 
> just knocking the JAC for "not consider[ing] this topic in 
> any depth," please.
> 
> Also, I suggest you consider the distinction between the 
> function of IPA as a bibliographic code and as a "language" 
> code. There are very, very few books, articles, or anything 
> else that consist exclusively or primarily of IPA used just 
> to represent text. Most of the ones that do exist are 
> experimental failures, basically.
> It would be very rare that you would need a bibliographic 
> code for a book *in* IPA, as opposed to a book *about* IPA or 
> including use *of* IPA. On the other hand, it is utterly 
> normal for IPA to be used extensively embedded in the middle 
> of otherwise normal Latin text (or, to be sure, as citations 
> used in the middle of Cyrillic or Japanese or Chinese or 
> whatever other text). If you embed a bunch of IPA in the 
> middle of otherwise unremarkable Latin text, you really 
> aren't talking about a bibliographic code at all, but tagging 
> runs of text as being in a special function orthography. If 
> that's what you need to make text searches work right for 
> interpreting such runs of specialized text, then make the case for it.
> 
> But the fact is that once you get beyond standard writing 
> systems with standardized spellings and start hitting the 
> text corpuses of specialized languages in specialized 
> orthographies which are increasingly likely to get openly 
> posted on the web, you are going to need a code for *each* 
> orthography in use, per language, to make any sense of the 
> content of those corpuses.
> 
> Say I were to start posting Chumash language materials on the 
> web in Unicode. (There are a significant number of linguists, 
> Chumash descendants, anthropologists, and just plain Chumash 
> afficionados among the general white population in Santa 
> Barbara and Ventura counties who would like that, by the 
> way.) To search that material, and just sticking to the 
> Barbareno version of Chumash, you would need at least:
> 
> Chumash-Barbareno in IPA
> Chumash-Barbareno in JPHarrington orthography (a massive 
> corpus) Chumash-Barbareno in Americanist orthography 
> Chumash-Barbareno in Applegate practical orthography (used by some
>                        anthropologists and a lot of material) 
> Chumash-Barbareno in Whistler practical orthography 
> Chumash-Barbareno in Chumash nation orthography
> 
> Because texts are spelled systematically differently in each 
> of those systems and use somewhat different repertoires of characters.
> 
> So make your case why IPA is special. (For Chumash, it would, 
> for example, be of very little real value, because very 
> little of the Chumash data is represented directly in IPA.) 
> Where do you draw the line in registering these thing?
> 
> Or do you think registering IPA just solves some problem that 
> won't come around again for the next technical orthography 
> that comes down the pike?
> 
> > 
> > Morever, I want to point out that the RA and the JAC are 
> two different 
> > entities, and that this view does not represent the view of the RA 
> > (which has not taken a position on the issue).
> 
> Yep. I agree with that.
> 
> --Ken
> 
> > 
> > Mark
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
> 
> 
> 
>