Nameprep and NFKC
John C Klensin
klensin at jck.com
Sun Oct 10 20:34:43 CEST 2010
--On Sunday, October 10, 2010 3:50 PM +0300 "Abdulrahman I.
ALGhadir" <aghadir at citc.gov.sa> wrote:
> As It mentioned that Nameprep profile will use the NFKC form
> of the string right? (as mentioned in section of 4)
Section 4 of what? Nameprep is now completely obsolete; there
is no dependency on it at all in IDNA2008.
> I am not sure if this is a valid example to begin with.
> Imagine these two chars 'a' and '*' if they appeared
> sequencely they would yield a unique display char. If those
> two didn't have a normalized char (is it possible?) and
> someone used the domain a* if UNICODE released a normalized
> form later on What will happen?
First of all, there are rather strict Unicode rules against most
or all changes that would affect normalization. I'll let someone
more expert than I am comment on the relationship between those
rules and your example.
I think by "normalized form", you mean what Unicode calls a
"precomposed form", i.e., a single code point that represents
the combination of 'a' and '*' ("a+* below). My understanding
that the such new code points are now discouraged entirely and
that, if they are added, NFC (see below) is not changed to
reflect the mapping you might expect. Instead, NFC is changed
to _decompose_ the new "a+*" codepoint into the "a" and "*"
combining sequence. This is one of the few advantages of using
NFD over NFC -- the behavior of NFD should always be predictable
(guessable) without one's needing to know the sequence in which
characters were added to the standard. Again, someone more
expert than I am may want to confirm or correct my understanding.
As far as IDNA is concerned, scenarios like the one you describe
are among the reasons why the Standard uses the much less
drastic NFC rather than NFKC and why it requires that input
strings be in NFC-compliant form, rather than doing its own
p.s. Please try to not send messages containing confidentiality
statements to IETF mailing lists. They violate IETF rules and
may either be ignored or discarded as a result.
More information about the Idna-update