toNFKC(toCaseFold(toNFKC(cp))) != cp and toNFKC failures
Mark Davis ☕
mark at macchiato.com
Fri May 27 15:56:32 CEST 2011
toNKFC is defined over all code points, including U+D800. The condition is
moot anyway, since D800 is excluded by other clauses.
Mark
*— Il meglio è l’inimico del bene —*
On Fri, May 27, 2011 at 02:43, Simon Josefsson <simon at josefsson.org> wrote:
> I'm looking at RFC 5892 section 2.2 which says:
>
> 2.2. Unstable (B)
>
> B: toNFKC(toCaseFold(toNFKC(cp))) != cp
>
> This category is used to group the characters that are not stable
> under Normalization Form K (NFKC) and case folding. In general,
> these code points are not suitable for use for IDN.
>
> The toCaseFold() operation is defined in Section 3.13 of The Unicode
> Standard [Unicode].
>
> The toNFKC() operation returns the code point in normalization form
> KC. For more information, see Section 5 of Unicode Standard Annex
> #15 [TR15].
>
> It should be noted that NFKC is used, although Normalization Form C
> (NFC) is used in the "IDNA Protocol" document [RFC5891].
>
> The toNFKC operation fails for some code points that aren't characters.
> For example U+D800 is not a character, and normalization will fail:
>
> http://demo.icu-project.org/icu-bin/nbrowser?t=&s=D800&uv=0
>
> How should the "Unstable" property be evaluated when toNFKC fails?
>
> Am I correct in using toNFKC(cp) = UNDEFINED for this situation, and
> specify that toCaseFold(UNDEFINED) = UNDEFINED and toNFKC(UNDEFINED) =
> UNDEFINED and then also that UNDEFINED is never equal to any code point?
>
> /Simon
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20110527/900f36bd/attachment.html>
More information about the Idna-update
mailing list