Mapping and Variants

Kenneth Whistler kenw at sybase.com
Mon Mar 9 23:09:25 CET 2009


Addressing what may be a side-issue in this thread...

Vint said:

> rather than focusing solely on the example that John used, I think it  
> is probably more useful to think about the evident side-effects of  
> incorporating IPA characters as PVALID under IDNA rules. I am not  
> arguing here that they should be excluded but only that if they are  
> included, we must think how best to deal with the kinds of confusion  
> that Erik and others have described.

I think this is non-sequitur. The confusion that Erik and others
have described have primarily to do with the overlap of uppercase
Latin, Greek, and Cyrillic letterforms, and very little, if anything
to do with IPA. IPA additions, with very few exceptions, have little
to do with issues of confusion of letters, for a couple of reasons:
1. They are typically lowercase forms, and don't usually get involved
with confusion between uppercase letters of distinct scripts, and 
2. The IPA additions which aren't already letters of basic Latin (or Greek) are 
almost always modified letter shapes that are *not* confused or confusable 
with existing Latin, Greek, or Cyrillic letters.

> However, it does seem useful to make sure that inclusion of a  
> potentially confusing block of Unicode characters is explicitly  
> considered.

IPA is not such a block.

> In the case of IPA, despite the ample and clear potential for  
> confusion, 

I do not understand that claim at all.

> it is my understanding that Mark Davis has pointed out that  
> some (many?) of these characters in the International Phonetic  
> Alphabet are used in written African (others?) languages.

Yes. African, American, Oceanic, Asian... hundreds of orthographies
are involved.

> If it were  
> the case that these glyphs were used ONLY for phonetic  
> representations, 

It is not.

> I would argue against their inclusion in the PVALID  
> set of IDNA characters. But if it is correct that they are or are  
> expected to be used in written languages,

It is.

> one can understand an  
> argument for their inclusion. What is painful, is the combinatoric  
> effect these characters produce if one is to try to counter their  
> abuse through treatment as variants (ie bundling, or other restrictive  
> registration policies).

Again, I do not understand this claim at all. Perhaps you do not
realize that IPA as encoded in Unicode is *already* unified against
basic Latin characters. An IPA representation [datimu] that doesn't
require use of additional, IPA-specific letters, would be represented
in Unicode with the same 6 characters as ASCII "datimu".

> Perhaps that is a price we have to pay for  
> attempting to be open to including written languages not yet a part of  
> the Unicode system?

We've got plenty of problems in IDNAbis, but inclusion of IPA
isn't one of them.

--Ken




More information about the Idna-update mailing list