toNFKC(toCaseFold(toNFKC(cp))) != cp and toNFKC failures

Simon Josefsson simon at josefsson.org
Fri May 27 17:40:24 CEST 2011


Mark Davis ☕ <mark at macchiato.com> writes:

> toNKFC is defined over all code points, including U+D800.

At least two common NFKC implementations, Libunistring and ICU, appears
to reject U+D800.  For ICU, the link below illustrate this.

> The condition is moot anyway, since D800 is excluded by other clauses.

Yes, the final 'Else DISALLOWED' clause, according to my code.

/Simon

>
> Mark
>
> *— Il meglio è l’inimico del bene —*
>
>
> On Fri, May 27, 2011 at 02:43, Simon Josefsson <simon at josefsson.org> wrote:
>
>> I'm looking at RFC 5892 section 2.2 which says:
>>
>>   2.2.  Unstable (B)
>>
>>   B: toNFKC(toCaseFold(toNFKC(cp))) != cp
>>
>>   This category is used to group the characters that are not stable
>>   under Normalization Form K (NFKC) and case folding.  In general,
>>   these code points are not suitable for use for IDN.
>>
>>   The toCaseFold() operation is defined in Section 3.13 of The Unicode
>>   Standard [Unicode].
>>
>>   The toNFKC() operation returns the code point in normalization form
>>   KC.  For more information, see Section 5 of Unicode Standard Annex
>>   #15 [TR15].
>>
>>   It should be noted that NFKC is used, although Normalization Form C
>>   (NFC) is used in the "IDNA Protocol" document [RFC5891].
>>
>> The toNFKC operation fails for some code points that aren't characters.
>> For example U+D800 is not a character, and normalization will fail:
>>
>> http://demo.icu-project.org/icu-bin/nbrowser?t=&s=D800&uv=0
>>
>> How should the "Unstable" property be evaluated when toNFKC fails?
>>
>> Am I correct in using toNFKC(cp) = UNDEFINED for this situation, and
>> specify that toCaseFold(UNDEFINED) = UNDEFINED and toNFKC(UNDEFINED) =
>> UNDEFINED and then also that UNDEFINED is never equal to any code point?
>>
>> /Simon
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update


More information about the Idna-update mailing list