IAB Statement on Identifiers and Unicode 7.0.0

Mark Davis ☕️ mark at macchiato.com
Wed Jan 28 15:13:32 CET 2015


On that principle, I think we are all in accord.

Unicode, however, doesn't consider any of the following pairs to *be* the
same character. Moreover, changing what we *do* consider the same character
(via NFC) would run into severe compatibility problems.

U+08A1 ARABIC LETTER BEH WITH HAMZA ABOVE
U+0628 ARABIC LETTER BEH​ ​U+0654 ARABIC HAMZA ABOVE

U+006F LATIN SMALL LETTER O
U+043E CYRILLIC SMALL LETTER O

U+00F8 ( ø ) LATIN SMALL LETTER O WITH STROKE
U+006F, U+0337 ( o̷ ) LATIN SMALL LETTER O, COMBINING SHORT SOLIDUS OVERLAY

And Patrick, the IAB letter recommending that U+0626, ARABIC LETTER YEH
WITH HAMZA ABOVE not be used in identifiers is tantamount to recommending
that U+00F8 ( ø ) LATIN SMALL LETTER O WITH STROKE not occur in
identifiers. Fine for you Swedes, but surely you must have some Danish and
Norwegian friends ;-)



Mark <https://google.com/+MarkDavis>

*— Il meglio è l’inimico del bene —*

On Wed, Jan 28, 2015 at 2:58 PM, Patrik Fältström <paf at frobbit.se> wrote:

> My view is similar to yours Vint.
>
> I think we must ensure that after stability validation (with the help of
> repeated application of lower case and normalization functions) have one
> and only one representation of each "character". Either as the composed
> code point, or as a base code point with one or more combination code
> points. And this without use of any context (like language) what so ever.
>
>    Patrik
>
> On 28 jan 2015, at 10:39, Vint Cerf <vint at google.com> wrote:
>
> i had something different in mind. What was key to IDNA2008 was the
> uniqueness of the UNICODE/PUNYCODE representations. Essentially, after
> normalization, one expects that the two strings are unambiguously
> equivalent. mapping from normalized unicode to punycode and back should
> produce the same (character for character) string. The problem that the
> Hamza discussion illustrates, as I understand it, is that there is no
> normalization that produces this result if one string uses the combined
> character and another uses the composed character sequence - no
> normalization produces an unambiguous result.
>
> v
>
>
> On Wed, Jan 28, 2015 at 3:43 AM, Mark Davis [image: ☕]️ <
> mark at macchiato.com> wrote:
>
>>
>> On Wed, Jan 28, 2015 at 9:20 AM, Vint Cerf <vint at google.com> wrote:
>>
>>> I am reading your message as saying "ambiguity is ok if there are few
>>> instances of it" while some of us would like the handling of identifiers
>>> encoded with Unicode to be unambiguous.
>>>
>>
>> The sense of "unambiguous" that matters to users is that when they read a
>> sequence of glyphs, their interpretation of the underlying character
>> sequence is correct (in normal environments, with common fonts).
>>
>> That level of "unambiguous" was impossible, even before Unicode.
>>
>> Take 8859-5, with both o and Russian o, or ASCII with "google.corn" vs "
>> goog1e.com". [Both the 1 and lowercase L are an issue, but also in many
>> fonts—in common use—users will read the (r + n) in the former as an m.]
>>
>> To extend Andrew's death analogy, there is no way that we can all live
>> forever. However, there are clearly medical processes and social policies
>> that can improve and extend the years that we all have. But to be
>> productive, the focus needs to be on the big ticket items, and thus needs
>> to be prioritized by real data.
>>
>> Mark <https://google.com/+MarkDavis>
>>
>> *— Il meglio è l’inimico del bene —*
>>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20150128/539b5b54/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: emoji_u2615.png
Type: image/png
Size: 1890 bytes
Desc: not available
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20150128/539b5b54/attachment-0001.png>


More information about the Idna-update mailing list