IAB Statement on Identifiers and Unicode 7.0.0

Patrik Fältström paf at frobbit.se
Wed Jan 28 14:58:15 CET 2015


My view is similar to yours Vint.

I think we must ensure that after stability validation (with the help of repeated application of lower case and normalization functions) have one and only one representation of each "character". Either as the composed code point, or as a base code point with one or more combination code points. And this without use of any context (like language) what so ever.

   Patrik

> On 28 jan 2015, at 10:39, Vint Cerf <vint at google.com> wrote:
> 
> i had something different in mind. What was key to IDNA2008 was the uniqueness of the UNICODE/PUNYCODE representations. Essentially, after normalization, one expects that the two strings are unambiguously equivalent. mapping from normalized unicode to punycode and back should produce the same (character for character) string. The problem that the Hamza discussion illustrates, as I understand it, is that there is no normalization that produces this result if one string uses the combined character and another uses the composed character sequence - no normalization produces an unambiguous result.
> 
> v
> 
> 
> On Wed, Jan 28, 2015 at 3:43 AM, Mark Davis ☕️ <mark at macchiato.com <mailto:mark at macchiato.com>> wrote:
> 
> On Wed, Jan 28, 2015 at 9:20 AM, Vint Cerf <vint at google.com <mailto:vint at google.com>> wrote:
> I am reading your message as saying "ambiguity is ok if there are few instances of it" while some of us would like the handling of identifiers encoded with Unicode to be unambiguous.
> 
> The sense of "unambiguous" that matters to users is that when they read a sequence of glyphs, their interpretation of the underlying character sequence is correct (in normal environments, with common fonts).
> 
> That level of "unambiguous" was impossible, even before Unicode.
> 
> Take 8859-5, with both o and Russian o, or ASCII with "google.corn" vs "goog1e.com <http://goog1e.com/>". [Both the 1 and lowercase L are an issue, but also in many fonts—in common use—users will read the (r + n) in the former as an m.]
> 
> To extend Andrew's death analogy, there is no way that we can all live forever. However, there are clearly medical processes and social policies that can improve and extend the years that we all have. But to be productive, the focus needs to be on the big ticket items, and thus needs to be prioritized by real data.
> 
> Mark <https://google.com/+MarkDavis>
> 
> — Il meglio è l’inimico del bene —
> 
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20150128/5a6ebfd5/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20150128/5a6ebfd5/attachment.sig>


More information about the Idna-update mailing list