Mapping and Variants

Tue Mar 10 02:48:47 CET 2009

Ken I have been given examples from Mark Davis that seem to lead to a  
different conclusion. I think he illustrated some of them in a  
subsequent email.

If script mixing is permitted then the IPA characters, with distinct  
codes, can indeed look like common characters from lower case ASCII,  
for example.

Some of the problems I am worried about would be mitigated if script  
mixing within a label were prohibited except under very definable  
exception conditions.

vint

Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com

On Mar 9, 2009, at 6:09 PM, Kenneth Whistler wrote:

> Addressing what may be a side-issue in this thread...
>
> Vint said:
>
>> rather than focusing solely on the example that John used, I think it
>> is probably more useful to think about the evident side-effects of
>> incorporating IPA characters as PVALID under IDNA rules. I am not
>> arguing here that they should be excluded but only that if they are
>> included, we must think how best to deal with the kinds of confusion
>> that Erik and others have described.
>
> I think this is non-sequitur. The confusion that Erik and others
> have described have primarily to do with the overlap of uppercase
> Latin, Greek, and Cyrillic letterforms, and very little, if anything
> to do with IPA. IPA additions, with very few exceptions, have little
> to do with issues of confusion of letters, for a couple of reasons:
> 1. They are typically lowercase forms, and don't usually get involved
> with confusion between uppercase letters of distinct scripts, and
> 2. The IPA additions which aren't already letters of basic Latin (or  
> Greek) are
> almost always modified letter shapes that are *not* confused or  
> confusable
> with existing Latin, Greek, or Cyrillic letters.
>
>> However, it does seem useful to make sure that inclusion of a
>> potentially confusing block of Unicode characters is explicitly
>> considered.
>
> IPA is not such a block.
>
>> In the case of IPA, despite the ample and clear potential for
>> confusion,
>
> I do not understand that claim at all.
>
>> it is my understanding that Mark Davis has pointed out that
>> some (many?) of these characters in the International Phonetic
>> Alphabet are used in written African (others?) languages.
>
> Yes. African, American, Oceanic, Asian... hundreds of orthographies
> are involved.
>
>> If it were
>> the case that these glyphs were used ONLY for phonetic
>> representations,
>
> It is not.
>
>> I would argue against their inclusion in the PVALID
>> set of IDNA characters. But if it is correct that they are or are
>> expected to be used in written languages,
>
> It is.
>
>> one can understand an
>> argument for their inclusion. What is painful, is the combinatoric
>> effect these characters produce if one is to try to counter their
>> abuse through treatment as variants (ie bundling, or other  
>> restrictive
>> registration policies).
>
> Again, I do not understand this claim at all. Perhaps you do not
> realize that IPA as encoded in Unicode is *already* unified against
> basic Latin characters. An IPA representation [datimu] that doesn't
> require use of additional, IPA-specific letters, would be represented
> in Unicode with the same 6 characters as ASCII "datimu".
>
>> Perhaps that is a price we have to pay for
>> attempting to be open to including written languages not yet a part  
>> of
>> the Unicode system?
>
> We've got plenty of problems in IDNAbis, but inclusion of IPA
> isn't one of them.
>
> --Ken
>
>