[iso15924-jac] Re: Phonetic orthographies

Kenneth Whistler kenw at sybase.com
Tue Nov 21 02:50:32 CET 2006


> 15924 does not encode just scripts, it also has variants and aliases, such
> as:
>    - Cyrs, Latf, Latg, Hans, Hant, Syre, Syrj, Syrn
>    - Hrkt, Jpan
> The inclusion of IPA as a variant script of Latin is little different from
> the distinction between Hans and Hant; both are primarily differences in
> selection of characters from UCS. The difference between English written in
> IPA vs regular Latin characters is certainly on the order of the difference
> between Chinese written in Hans vs Hant, if not more so.

True, but a specious analogy nonetheless.

Basically what is going on here is that script codes, because
they are available and tied to the language code apparatus, are
being extended to apply to any "significant variation in writing system"
that pops up to the level of "we care about the difference for
our implementations."

Now maybe that is exactly what needs to be done, but in my opinion
the right way to handle this is to first *formally* extend the
scope of 15924, so that it no longer is a standard for the
registration of script codes, but for script codes *and*
selected orthography codes of interest *and* selected variants
of writing systems of interest. At that point the JAC wouldn't
have to sit and argue on principle whether some particular
oddball request fits or not, and implementers would be freer
to ask for stuff that matches distinctions they would like to

As it is, it is bad enough that we have a "script" registration
standard that tries to match up against the "scripts" encoded
in Unicode, and has a mostly unexplained hairball of stuff
which can't be matched up, but now requests to register stuff like
IPA for a script code keep pushing things further that way.

> It would be of
> great benefit to users of IPA to be able to tag data with a variant script
> code, and little pragmatic reason not to allow that, especially in view of
> the fact that the standard has already been stretched to include variants
> and aliases.

Dunno what aliases have to do with it, other than to puff up the

And IPA is not a variant script. It is not comparable to Latf and Latg.
It is a circumscribed, technical use of Latn. "cat" is English.
"[cat]" is IPA. Tell me the script difference, except in function.

So as in the case of Hant versus Hans, registering IPA with a 
script code would be another ad hoc extension of 15924 in an 
orthogonal but basically unexplained direction.

The pragmatic reason not to allow that is to prevent 15924 being
used to further muddy all the dimensions of distinctions in
writing systems.

But the pragmatic reason to *allow* it would be to let Google
and Microsoft do what they want to do for searches anyway, and
to hell with expecting 15924 to make any sense outside its
use as a standard for labeling "written stuff we want to distinguish".

> >> This is not my view only. It was the view of the RA.
> Regarding the above statement, I also want to add that as far as I can tell,
> the 15924 JAC did not consider this topic in any depth, nor does any of the
> discussion here seem to be forwarded to the JAC for their consideration; I
> believe that the members are unaware of the issues raised regarding language
> tags. As far as I could see from email, the sum total of the discussion was
> three statements, two by the same person:
> A: "As far as I can see, IPA is just a set of Latin characters."
> A: "The IPA is a set of Latin letters, and can be represented by Latn. It is
> an orthography of Latin, not a script of its own."
> C: "I concur with this conclusion."
> [names removed to protect the innocent]

Yeah, yeah, cute, Mark. Note also that Michael and I, at least,
were trained in IPA (and other phonetic orthographies) and
made significant professional use of them. So it isn't as
if we are babes in the woods here presented with something
we've never heard of before, and are making off-the-cuff, uninformed
remarks about.

If you feel that a registration for IPA belongs in 15924, then
make the case why 15924 should start registering orthographic
conventions for the use of a script, instead of just knocking
the JAC for "not consider[ing] this topic in any depth," please.

Also, I suggest you consider the distinction between the function
of IPA as a bibliographic code and as a "language" code. There
are very, very few books, articles, or anything else that consist
exclusively or primarily of IPA used just to represent text. Most
of the ones that do exist are experimental failures, basically.
It would be very rare that you would need a bibliographic code
for a book *in* IPA, as opposed to a book *about* IPA or including
use *of* IPA. On the other hand, it is utterly normal for
IPA to be used extensively embedded in the middle of otherwise
normal Latin text (or, to be sure, as citations used in the
middle of Cyrillic or Japanese or Chinese or whatever other
text). If you embed a bunch of IPA in the middle of otherwise
unremarkable Latin text, you really aren't talking about a
bibliographic code at all, but tagging runs of text as being
in a special function orthography. If that's what you need to
make text searches work right for interpreting such runs of
specialized text, then make the case for it.

But the fact is that once you get beyond standard writing systems
with standardized spellings and start hitting the text corpuses
of specialized languages in specialized orthographies which are
increasingly likely to get openly posted on the web, you
are going to need a code for *each* orthography in use, per language,
to make any sense of the content of those corpuses.

Say I were to start posting Chumash language materials on the
web in Unicode. (There are a significant number of linguists,
Chumash descendants, anthropologists, and just plain Chumash
afficionados among the general white population in Santa Barbara
and Ventura counties who would like that, by the way.) To
search that material, and just sticking to the Barbareno
version of Chumash, you would need at least:

Chumash-Barbareno in IPA
Chumash-Barbareno in JPHarrington orthography (a massive corpus)
Chumash-Barbareno in Americanist orthography
Chumash-Barbareno in Applegate practical orthography (used by some
                       anthropologists and a lot of material)
Chumash-Barbareno in Whistler practical orthography
Chumash-Barbareno in Chumash nation orthography

Because texts are spelled systematically differently in each of
those systems and use somewhat different repertoires of characters.

So make your case why IPA is special. (For Chumash, it would,
for example, be of very little real value, because very little
of the Chumash data is represented directly in IPA.) Where do
you draw the line in registering these thing?

Or do you think registering IPA just solves some problem that
won't come around again for the next technical orthography
that comes down the pike?

> Morever, I want to point out that the RA and the JAC are two different
> entities, and that this view does not represent the view of the RA (which
> has not taken a position on the issue).

Yep. I agree with that.


> Mark

More information about the Ietf-languages mailing list