[Ltru] Re: [iso15924-jac] Re: Phonetic orthographies

Martin Hosken martin_hosken at sil.org
Wed Nov 22 04:23:30 CET 2006

Dear All,

>     Say I were to start posting Chumash language materials on the
>     web in Unicode. (There are a significant number of linguists,
>     Chumash descendants, anthropologists, and just plain Chumash
>     afficionados among the general white population in Santa Barbara
>     and Ventura counties who would like that, by the way.) To
>     search that material, and just sticking to the Barbareno
>     version of Chumash, you would need at least:
>     Chumash-Barbareno in IPA

IPA is a script/transcription system that is used for multiple languages
even more than Roman script. I.e. a reader of IPA can do a pretty good
job of sounding out text in a language written using it, that they do
not know.

>     Chumash-Barbareno in JPHarrington orthography (a massive corpus)

I know nothing about this orthography. But whether it is phonetic or
phonemic, I would suggest that it is limited to this and perhaps a few
other languages. Therefore allocating a per language extension for this
orthography seems sensible.

>     Chumash-Barbareno in Americanist orthography

Americanist is like the IPA to a limited extent, in that it is used to
write multiple languages and can be read by users who have no knowledge
of the language.

>     Chumash-Barbareno in Applegate practical orthography (used by some
>                            anthropologists and a lot of material)

This is a language specific orthography and should be registered as an
script extension of that one language
>     Chumash-Barbareno in Whistler practical orthography
>     Chumash-Barbareno in Chumash nation orthography

Likewise for these two.

I posit, therefore that there are two kinds of transcription system that
have to be dealt with. Those that are designed to handle a single
language or small group of languages and those that are designed to
represent the sounds of any language. Those transcription systems
designed for a single language can be tagged adequately within the
existing system by allocating a language extension. Those that are
designed to be universal (or near universal) transcription systems (of
which I only know of IPA and Americanist, are there others?) are not
best served by a model that only allows for per language extensions.

These universal systems are scripts/script variants, call them what you
will. I don't really mind where one draws the line, but I do want to
know how to tag stuff that has been written using them. Please help. As
SIL starts to get its act together in this area, we would really value
your input on how you want us to tag our data. The danger I fear is that
the need to tag will precede the knowledge of how to tag and people will
just make things up and we'll have a retagging nightmare on our hands.


More information about the Ietf-languages mailing list