Phonetic orthographies

Michael Everson everson at
Sat Nov 11 13:02:39 CET 2006

At 13:17 -0800 2006-11-10, Peter Constable wrote:

>>>Perhaps the ISO 15924 RA would like to suggest 
>>>a alternative solution to its user community 
>>>in view of the request for a solution?
>>It's not the RA's job to do that, really.
>It *is* the RAs job to register tags that users 
>want to use, and to service the user needs for 
>which ISO 15924 was created.

Which needs were not, I should think, fixing 
problems in parsing software having to do with 
hierarchies which were not in existence when the 
standard was being drafted. The user needs for 
which ISO 15924 was created were to identify the 
names of scripts with four-letter codes. I 
recognize that you have a problem. You want to be 
able to tag a run of text in some way so that, 
for instance, voice software could read it out. 
That's a nice application. I do not, however, 
believe that this is a matter of "script code". 
Because IPA uses the Latin script, it is a matter 
of orthography, and is therefore the solution 
should reside in the realm of "language tag" as 
other orthographic distinctions do.

>If the RA does't feel a particular user need 
>should be met using the standard when users are 
>suggesting that it should, then IMO the RA 
>should be prepared to suggest where an 
>alternative solution might lie.

The RA doesn't agree that bogus script codes 
should be entered into the registry, and "Latp" 
or "Ipaa" are both bogus. The script is Latin. 
Its shape is Roman, not Frakur or Gaelic or 
anything else unrecognizable. I'm sorry if you 
don't like it. I'm only one member of the RA.

I have suggested a set of orthographic tags which 
as far as I can see could suffice. Oxford 
spelling may be described as "en-GB-oed". IPA 
transcription might just as easily be described 
as "en-GB-fonipa".

>Just the the ISO 639 JAC needs to be prepared to do.
>  > However, I (for my part) did suggest that the following might be used:
>Yes, but users are saying these alone are not 
>considered sufficient for the needs, and you 
>have not provided a solution to that extent.

So you're saying that nothing but a bogus script 
code will satisfy you. I don't see how this is 
the fault of the RA. Some users of the UCS think 
that only 5000 precomposed Tibetan syllables will 
satisfy them. But that's not the fault of WG2 or 
the UTC.

>  > ISO 15924 is based on form.
>Well, let's consider this. Is Fraser a subset of Latin or separate script?

It is a separate script.

>In terms of form, it is very clearly a subset of 
>Latin, yet I believe I've heard you say it must 
>be considered a separate script because of its 
>unicameral behaviour.

Its origins are Latin, but its behaviour and its 
shapes are not. Font variation (deviation from 
Helvetica/Arial letterforms), for instance, is 
either unheard of or extremely rare. 
"Lower-cased" Fraser is *illegible* to Lisus. 
Upper-cased IPA is not conventional, (and some 
letters do not have upper-case versions encoded) 
but that does not mean that it is illegible, and 
indeed as we have seen such a development is 
natural as in African orthographies. Please also 
note the use of capitalization in IPA text on 
pages 51-52 of the 1949 IPA Handbook. Doubtless 
there are other texts extant which make use of 
this convention.

>Phonetic transcriptions -- certainly those I'm 
>familiar with -- are absolutely unicameral.

	crdiloetis kari da mza k'amatobden tu romeli iqo upro dzlieri.
	Crdiloetis kari da mza k'amatobden tu romeli iqo upro dzlieri.

All three are Latin. But one is also something 
other than Latin? I'm sorry. I don't accept that.

>(E.g. in Americanist, "a" and "A" represent 
>distinct sounds.) So, by that line of reasoning, 
>you ought equally to consider phonetic 
>transcriptions separate scripts.

I think I know what a script is. I do not believe 
that IPA is a separate script from Latin. It is 
an orthography using Latin letters.

>I think we'd all agree that that's not where we 
>want to go. But I suggest to you it ought to be 
>enough to say that phonetic transcriptions based 
>on Latin have some distinctive behaviour that 
>warrants considering them a script variant.

No, because again it is a matter of orthography. 
Some African orthographies began as IPA 
transcriptions. As you are well aware, capital 
forms of the "new" letters are quickly devised by 
people as soon as they standardize the 
transcription into orthography. Runs of text 
which do not contain personal names or begin 
sentences are therefore IDENTICAL with IPA 
transcription. How does this merit a separate 
script code?

>>That still does not mean that IPA, or UPA, or 
>>Landsmålsalfabetet, or Webster's spelling, are 
>>scripts other than Latin. Nor does it mean that 
>>they belong to some collective variant of Latin
>I think you are too swayed by an academic, 
>graphology perspective and have lost [sight] of 
>the fact that ISO 15924 exists NOT as a form of 
>academic documentation but rather to serve 
>practical IT purposes. (I find this very 
>reminiscent of the es-americas issue: you 
>opposed it because it didn't fit your 
>understanding of dialectology when you were 
>missing the very real practical IT need.)

You may, but I (personally) devised ISO 15924 in 
the first place, and edited it from beginning to 
end, so I *might* be expected to know what it is 
for. It is a standard for the identification of 
the names of scripts. "Latin Phonetic" isn't the 
name of a script. "Phonetic transcription" isn't 
the name of a script. "IPA" doesn't trump the 
hundreds of other phonetic transcriptions out 
there and deserve its own script code while all 
of the others do not. (The Spanish example is not 

This is not only my opinion. The RA rejected a 
proposal already for "Ipaa": "The IPA is a set of 
Latin letters, and can be represented by Latn. It 
is an orthography of Latin, not a script of its 

>Again, you've got users saying that they have a 
>need -- including in lexicography and 
>linguistics -- to code Latin-based phonetic 
>transcriptions as a script variant.

I recognize that there is a need to identify runs 
of text as IPA orthography. I do not accept that 
the distinction is one of *script*; it is indeed 
a distinction of *orthography*.

>The intent of the standard is to code just such 
>things, and to provide usage guidance. Please 
>encode "Latp", or please provide guidance as to 
>how the practical need can be better met.

Your *wanting* them to be a script variant does 
not *make* them a script variant. You have not 
convinced me. You seem to want "Latp" to be some 
sort of macro-script that would encompass Webster 
and UPA and IPA and the rest in one family. But 
"Latn" does this already, and Webster and UPA and 
IPA are just orthographies.

>>What script is this in?
>>	crdiloetis kari da mza k'amatobden tu romeli iqo upro dzlieri.
>>It's Latin, isn't it?
>Yes; and note the complete in appropriateness of
>	Crdiloetis kari da mza k'amatobden tu romeli iqo upro dzlieri.

This is inappropriate in what way? It is natural. 
It happens in African and North American 
languages quite regularly, and you and I both 
have often proposed to encode missing capital 
letters to support such development.

>The capitalization has just turned this content 
>into some completely different "orthography" 
>with no known usage. Clearly this is Latin, but 
>with exceptional rules -- i.e. a distinct 
>variant of Latin.

So "crdiloetis kari da mza" is ambiguous as to 
being Latp or Latn but "Crdiloetis kari da mza" 
is not? This... forgive me... is preposterous, in 
my view.

>>I comprehend what you are describing. I don't 
>>think that ISO standards should be, hm, abused 
>>in this way.
>This is not an abuse but a very reasonable and 
>practical IT application. It can only be seen as 
>an abuse if you insist of thinking of the intent 
>of the standard as being to provide academic 
>documentation of scripts, or if you find a much 
>better way to engineer solutions to the IT 
>needs. Again, the RA has not done the latter, so 
>I must assume the RA is doing the former, which 
>is deviating from the intent of the standard.

The RA has said that the distinction is one of 
orthography and not one of script. I have 
endeavoured to address the requirement by 
proposing orthography tags. I am quite confident 
that if you

>>*Latp is no different than, say an ISO 639 tag 
>>*enc, taken to be a variety of "eng" 'English' 
>>designed for use by speakers of varieties of 
>>"Commonwealth English" (en-GB, en-IE, en-ZA, 
>>en-AU, en-NZ) which may share many features and 
>>be difficult for speakers of other varieties of 
>>English to understand. It would make your 
>>filter much easier, but it would be the wrong 
>>thing to do.
>I think a much closer analogy would be an ISO 
>639 ID zh that encompasses yue, cmn, etc. And 
>ISO 639 does encode zh.

I do not think you have understood what I wrote, but perhaps it is moot.

For my part, I do try to do my job with due 
diligence, and I have proposed a set of 
appropriate orthography subtags. Please 
investigate the possibilities of making software 
which is able to make use of such tags in order 
to identify phonetic orthographies.
Michael Everson *

More information about the Ietf-languages mailing list