toNFKC(toCaseFold(toNFKC(cp))) != cp and toNFKC failures
simon at josefsson.org
Fri May 27 11:43:46 CEST 2011
I'm looking at RFC 5892 section 2.2 which says:
2.2. Unstable (B)
B: toNFKC(toCaseFold(toNFKC(cp))) != cp
This category is used to group the characters that are not stable
under Normalization Form K (NFKC) and case folding. In general,
these code points are not suitable for use for IDN.
The toCaseFold() operation is defined in Section 3.13 of The Unicode
The toNFKC() operation returns the code point in normalization form
KC. For more information, see Section 5 of Unicode Standard Annex
It should be noted that NFKC is used, although Normalization Form C
(NFC) is used in the "IDNA Protocol" document [RFC5891].
The toNFKC operation fails for some code points that aren't characters.
For example U+D800 is not a character, and normalization will fail:
How should the "Unstable" property be evaluated when toNFKC fails?
Am I correct in using toNFKC(cp) = UNDEFINED for this situation, and
specify that toCaseFold(UNDEFINED) = UNDEFINED and toNFKC(UNDEFINED) =
UNDEFINED and then also that UNDEFINED is never equal to any code point?
More information about the Idna-update