[iso15924-jac] Re: Phonetic orthographies
Debbie Garside
debbie at ictmarketing.co.uk
Tue Nov 21 11:57:50 CET 2006
Kenneth wrote:
> But the fact is that once you get beyond standard writing
> systems with standardized spellings and start hitting the
> text corpuses of specialized languages in specialized
> orthographies which are increasingly likely to get openly
> posted on the web, you are going to need a code for *each*
> orthography in use, per language, to make any sense of the
> content of those corpuses.
Absolutely true. This is what the written mode within ISO 639-6 attempts to
provide.
Best regards
Debbie
> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no
> [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of
> Kenneth Whistler
> Sent: 21 November 2006 01:51
> To: mark.davis at icu-project.org
> Cc: ietf-languages at iana.org; iso15924-jac at unicode.org; kenw at sybase.com
> Subject: Re: [iso15924-jac] Re: Phonetic orthographies
>
> Mark,
>
> >
> > 15924 does not encode just scripts, it also has variants
> and aliases,
> > such
> > as:
> >
> > - Cyrs, Latf, Latg, Hans, Hant, Syre, Syrj, Syrn
> > - Hrkt, Jpan
> >
> > The inclusion of IPA as a variant script of Latin is little
> different
> > from the distinction between Hans and Hant; both are primarily
> > differences in selection of characters from UCS. The difference
> > between English written in IPA vs regular Latin characters is
> > certainly on the order of the difference between Chinese
> written in Hans vs Hant, if not more so.
>
> True, but a specious analogy nonetheless.
>
> Basically what is going on here is that script codes, because
> they are available and tied to the language code apparatus,
> are being extended to apply to any "significant variation in
> writing system"
> that pops up to the level of "we care about the difference
> for our implementations."
>
> Now maybe that is exactly what needs to be done, but in my
> opinion the right way to handle this is to first *formally*
> extend the scope of 15924, so that it no longer is a standard
> for the registration of script codes, but for script codes
> *and* selected orthography codes of interest *and* selected
> variants of writing systems of interest. At that point the
> JAC wouldn't have to sit and argue on principle whether some
> particular oddball request fits or not, and implementers
> would be freer to ask for stuff that matches distinctions
> they would like to make.
>
> As it is, it is bad enough that we have a "script"
> registration standard that tries to match up against the
> "scripts" encoded in Unicode, and has a mostly unexplained
> hairball of stuff which can't be matched up, but now requests
> to register stuff like IPA for a script code keep pushing
> things further that way.
>
> > It would be of
> > great benefit to users of IPA to be able to tag data with a variant
> > script code, and little pragmatic reason not to allow that,
> especially
> > in view of the fact that the standard has already been stretched to
> > include variants and aliases.
>
> Dunno what aliases have to do with it, other than to puff up
> the argument.
>
> And IPA is not a variant script. It is not comparable to Latf
> and Latg.
> It is a circumscribed, technical use of Latn. "cat" is English.
> "[cat]" is IPA. Tell me the script difference, except in function.
>
> So as in the case of Hant versus Hans, registering IPA with a
> script code would be another ad hoc extension of 15924 in an
> orthogonal but basically unexplained direction.
>
> The pragmatic reason not to allow that is to prevent 15924
> being used to further muddy all the dimensions of
> distinctions in writing systems.
>
> But the pragmatic reason to *allow* it would be to let Google
> and Microsoft do what they want to do for searches anyway,
> and to hell with expecting 15924 to make any sense outside
> its use as a standard for labeling "written stuff we want to
> distinguish".
>
>
> > >> This is not my view only. It was the view of the RA.
> >
> > Regarding the above statement, I also want to add that as
> far as I can
> > tell, the 15924 JAC did not consider this topic in any
> depth, nor does
> > any of the discussion here seem to be forwarded to the JAC
> for their
> > consideration; I believe that the members are unaware of the issues
> > raised regarding language tags. As far as I could see from
> email, the
> > sum total of the discussion was three statements, two by
> the same person:
> >
> > A: "As far as I can see, IPA is just a set of Latin characters."
> > A: "The IPA is a set of Latin letters, and can be
> represented by Latn.
> > It is an orthography of Latin, not a script of its own."
> > C: "I concur with this conclusion."
> > [names removed to protect the innocent]
>
> Yeah, yeah, cute, Mark. Note also that Michael and I, at
> least, were trained in IPA (and other phonetic orthographies)
> and made significant professional use of them. So it isn't as
> if we are babes in the woods here presented with something
> we've never heard of before, and are making off-the-cuff,
> uninformed remarks about.
>
> If you feel that a registration for IPA belongs in 15924,
> then make the case why 15924 should start registering
> orthographic conventions for the use of a script, instead of
> just knocking the JAC for "not consider[ing] this topic in
> any depth," please.
>
> Also, I suggest you consider the distinction between the
> function of IPA as a bibliographic code and as a "language"
> code. There are very, very few books, articles, or anything
> else that consist exclusively or primarily of IPA used just
> to represent text. Most of the ones that do exist are
> experimental failures, basically.
> It would be very rare that you would need a bibliographic
> code for a book *in* IPA, as opposed to a book *about* IPA or
> including use *of* IPA. On the other hand, it is utterly
> normal for IPA to be used extensively embedded in the middle
> of otherwise normal Latin text (or, to be sure, as citations
> used in the middle of Cyrillic or Japanese or Chinese or
> whatever other text). If you embed a bunch of IPA in the
> middle of otherwise unremarkable Latin text, you really
> aren't talking about a bibliographic code at all, but tagging
> runs of text as being in a special function orthography. If
> that's what you need to make text searches work right for
> interpreting such runs of specialized text, then make the case for it.
>
> But the fact is that once you get beyond standard writing
> systems with standardized spellings and start hitting the
> text corpuses of specialized languages in specialized
> orthographies which are increasingly likely to get openly
> posted on the web, you are going to need a code for *each*
> orthography in use, per language, to make any sense of the
> content of those corpuses.
>
> Say I were to start posting Chumash language materials on the
> web in Unicode. (There are a significant number of linguists,
> Chumash descendants, anthropologists, and just plain Chumash
> afficionados among the general white population in Santa
> Barbara and Ventura counties who would like that, by the
> way.) To search that material, and just sticking to the
> Barbareno version of Chumash, you would need at least:
>
> Chumash-Barbareno in IPA
> Chumash-Barbareno in JPHarrington orthography (a massive
> corpus) Chumash-Barbareno in Americanist orthography
> Chumash-Barbareno in Applegate practical orthography (used by some
> anthropologists and a lot of material)
> Chumash-Barbareno in Whistler practical orthography
> Chumash-Barbareno in Chumash nation orthography
>
> Because texts are spelled systematically differently in each
> of those systems and use somewhat different repertoires of characters.
>
> So make your case why IPA is special. (For Chumash, it would,
> for example, be of very little real value, because very
> little of the Chumash data is represented directly in IPA.)
> Where do you draw the line in registering these thing?
>
> Or do you think registering IPA just solves some problem that
> won't come around again for the next technical orthography
> that comes down the pike?
>
> >
> > Morever, I want to point out that the RA and the JAC are
> two different
> > entities, and that this view does not represent the view of the RA
> > (which has not taken a position on the issue).
>
> Yep. I agree with that.
>
> --Ken
>
> >
> > Mark
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
>
>
More information about the Ietf-languages
mailing list