[iso15924-jac] Re: Phonetic orthographies

Wed Nov 22 02:29:50 CET 2006

On 11/20/06, Kenneth Whistler <kenw at sybase.com> wrote:
>
> Mark,
>
> >
> > 15924 does not encode just scripts, it also has variants and aliases,
> such
> > as:
> >
> >    - Cyrs, Latf, Latg, Hans, Hant, Syre, Syrj, Syrn
> >    - Hrkt, Jpan
> >
> > The inclusion of IPA as a variant script of Latin is little different
> from
> > the distinction between Hans and Hant; both are primarily differences in
> > selection of characters from UCS. The difference between English written
> in
> > IPA vs regular Latin characters is certainly on the order of the
> difference
> > between Chinese written in Hans vs Hant, if not more so.
>
> True, but a specious analogy nonetheless.
>
> Basically what is going on here is that script codes, because
> they are available and tied to the language code apparatus, are
> being extended to apply to any "significant variation in writing system"
> that pops up to the level of "we care about the difference for
> our implementations."
>
> Now maybe that is exactly what needs to be done, but in my opinion
> the right way to handle this is to first *formally* extend the
> scope of 15924, so that it no longer is a standard for the
> registration of script codes, but for script codes *and*
> selected orthography codes of interest *and* selected variants
> of writing systems of interest. At that point the JAC wouldn't
> have to sit and argue on principle whether some particular
> oddball request fits or not, and implementers would be freer
> to ask for stuff that matches distinctions they would like to
> make.
>
> As it is, it is bad enough that we have a "script" registration
> standard that tries to match up against the "scripts" encoded
> in Unicode, and has a mostly unexplained hairball of stuff
> which can't be matched up, but now requests to register stuff like
> IPA for a script code keep pushing things further that way.

I don't really see the necessity for a charter change; in particular, I
don't see anything in
http://www.unicode.org/iso15924/standard/index.htmlthat would say that
Hant is a valid script variant, and IPA is not. Maybe
I'm not looking in the right area, so any help would be appreciated.

> It would be of
> > great benefit to users of IPA to be able to tag data with a variant
> script
> > code, and little pragmatic reason not to allow that, especially in view
> of
> > the fact that the standard has already been stretched to include
> variants
> > and aliases.
>
> Dunno what aliases have to do with it, other than to puff up the
> argument.

What they have to do with it is that 15924 is already not "pure": Jpan is
not the name of "a" script.

And IPA is not a variant script. It is not comparable to Latf and Latg.
> It is a circumscribed, technical use of Latn.

"cat" is English.
> "[cat]" is IPA. Tell me the script difference, except in function.

mutatis mutandis: "一二三" is Hant, and "一二三" is Hans. Tell me the script
difference, except in function.

So as in the case of Hant versus Hans, registering IPA with a
> script code would be another ad hoc extension of 15924 in an
> orthogonal but basically unexplained direction.

You seem to see this as a slippery slope; take that one drink, and there is
an inevitable path to lying in the gutter with a bottle of Ripple in a brown
paper bag. I see it as following the precedent already set with Hans/Hant,
and providing a reasonable, pragmatic solution for language tags. I don't
see anyone wanting to use script codes for smaller orthographic
distinctions, such as between "theatre" and "theater"; language tags can
already encompass such differences.

The pragmatic reason not to allow that is to prevent 15924 being
> used to further muddy all the dimensions of distinctions in
> writing systems.
>
> But the pragmatic reason to *allow* it would be to let Google
> and Microsoft do what they want to do for searches anyway, and
> to hell with expecting 15924 to make any sense outside its
> use as a standard for labeling "written stuff we want to distinguish".

The political season is over, and I see no need to get into ad hominem
attacks. Google has, to my knowledge, no particular stake in this issue -- I
don't know that MS does either. This is really just a technical issue of how
to best use script values in the language tag mechanism most effectively.
Language tags are a big customer for ISO 15924, and it would seem reasonable
to at least consider the issue from all sides.

> >> This is not my view only. It was the view of the RA.
> >
> > Regarding the above statement, I also want to add that as far as I can
> tell,
> > the 15924 JAC did not consider this topic in any depth, nor does any of
> the
> > discussion here seem to be forwarded to the JAC for their consideration;
> I
> > believe that the members are unaware of the issues raised regarding
> language
> > tags. As far as I could see from email, the sum total of the discussion
> was
> > three statements, two by the same person:
> >
> > A: "As far as I can see, IPA is just a set of Latin characters."
> > A: "The IPA is a set of Latin letters, and can be represented by Latn.
> It is
> > an orthography of Latin, not a script of its own."
> > C: "I concur with this conclusion."
> > [names removed to protect the innocent]
>
> Yeah, yeah, cute, Mark. Note also that Michael and I, at least,
> were trained in IPA (and other phonetic orthographies) and
> made significant professional use of them. So it isn't as
> if we are babes in the woods here presented with something
> we've never heard of before, and are making off-the-cuff, uninformed
> remarks about.
>
> If you feel that a registration for IPA belongs in 15924, then
> make the case why 15924 should start registering orthographic
> conventions for the use of a script, instead of just knocking
> the JAC for "not consider[ing] this topic in any depth," please.

I think I might not have been clear. While there may be good reasons for why
IPA doesn't qualify, and yet Hans & Hant qualify, but the JAC did not make
the rationale clear. While the difference may be blindingly obvious to you,
it would be helpful to hear what it actually is, more than an "I concur".

Also, I suggest you consider the distinction between the function
> of IPA as a bibliographic code and as a "language" code. There
> are very, very few books, articles, or anything else that consist
> exclusively or primarily of IPA used just to represent text. Most
> of the ones that do exist are experimental failures, basically.
> It would be very rare that you would need a bibliographic code
> for a book *in* IPA, as opposed to a book *about* IPA or including
> use *of* IPA. On the other hand, it is utterly normal for
> IPA to be used extensively embedded in the middle of otherwise
> normal Latin text (or, to be sure, as citations used in the
> middle of Cyrillic or Japanese or Chinese or whatever other
> text). If you embed a bunch of IPA in the middle of otherwise
> unremarkable Latin text, you really aren't talking about a
> bibliographic code at all, but tagging runs of text as being
> in a special function orthography. If that's what you need to
> make text searches work right for interpreting such runs of
> specialized text, then make the case for it.

I don't think anyone is expecting books in IPA -- it would be, as you say,
tagged fragments.

But the fact is that once you get beyond standard writing systems
> with standardized spellings and start hitting the text corpuses
> of specialized languages in specialized orthographies which are
> increasingly likely to get openly posted on the web, you
> are going to need a code for *each* orthography in use, per language,
> to make any sense of the content of those corpuses.
>
> Say I were to start posting Chumash language materials on the
> web in Unicode. (There are a significant number of linguists,
> Chumash descendants, anthropologists, and just plain Chumash
> afficionados among the general white population in Santa Barbara
> and Ventura counties who would like that, by the way.) To
> search that material, and just sticking to the Barbareno
> version of Chumash, you would need at least:
>
> Chumash-Barbareno in IPA
> Chumash-Barbareno in JPHarrington orthography (a massive corpus)
> Chumash-Barbareno in Americanist orthography
> Chumash-Barbareno in Applegate practical orthography (used by some
>                        anthropologists and a lot of material)
> Chumash-Barbareno in Whistler practical orthography
> Chumash-Barbareno in Chumash nation orthography
>
> Because texts are spelled systematically differently in each of
> those systems and use somewhat different repertoires of characters.
>
> So make your case why IPA is special. (For Chumash, it would,
> for example, be of very little real value, because very little
> of the Chumash data is represented directly in IPA.) Where do
> you draw the line in registering these thing?
>
> Or do you think registering IPA just solves some problem that
> won't come around again for the next technical orthography
> that comes down the pike?

No, I think everyone is aware that there are multiple systems of phonetic
representations. So IPA would likely be the first of several. But it is
clearly a relatively important one, being in pretty widespread use in
dictionaries and other sources. As with encoding characters, or script
variants, at some point you have to make judgments as to whether a system is
in wide enough usage to be worth encoding; systems that are in ad hoc,
limited use wouldn't qualify.

But a side note: a list of 8 different potential systems is not exactly
scary, given we that we are at the point of adding some 7,000 new language
tags.

>
> > Morever, I want to point out that the RA and the JAC are two different
> > entities, and that this view does not represent the view of the RA
> (which
> > has not taken a position on the issue).
>
> Yep. I agree with that.
>
> --Ken
>
> >
> > Mark
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20061121/92172a69/attachment-0001.html