Prohibiting mapping of PVALID characters

Kenneth Whistler kenw at sybase.com
Thu Dec 10 23:36:54 CET 2009


Martin said:

> [For everybody, I think it's important to understand that (as far as I 
> know, Ken or Mark please correct me if I'm wrong), none of the two (to 
> four) characters we are currently wrestling with is in anyway involved 
> in NFC.]

Well, all four of them trivially map to themselves for NFC. The
important thing is that none of them maps to something *else* as
a result of canonical decomposition, and none of them is a target for
canonical composition *from* something else. So they should always stay
unchanged for an NFC normalization of a string.
 
> a) There may be sequences of PVALID and not PVALID characters, than when 
> applying NFC, turn into sequences of PVALID only. If there are such 
> cases, I think they should be allowed, and our text shouldn't forbid 
> such a transformation.

This case *can* occur in Korean:

U+AC00 + U+11A8      --NFC--> U+AC01
PVALID   DISALLOWED           PVALID

The sequence of a LV syllable plus a conjoining T jamo, such as
above, is certainly not the preferred representation of Korean,
but it can occur in text. NFC would convert it to the preferred
full-formed LVT syllable, U+AC01 in this case.

So that is an example of a string containing DISALLOWED characters
that would result in all PVALID characters once normalized
to NFC (without any other mapping involved). This stems from
the fact that the conjoining jamo have been set all to DISALLOWED
by exceptional rule (the 2.9 OldHangulJamo (I) class) in
the tables document.

> b) There may be a way to apply NFC to a sequence of PVALID characters 
> only and the result would contain some non-PVALID characters. If that's 
> the case, those characters might further be mapped to something; would 
> we be okay with that?

I cannot think of any examples of this. The particular set of
exceptions in tables-08.txt now don't include any that would
result in a violation of PVALID closure for strings under an NFC mapping,
as best as I can tell. Although it might make sense for somebody
to try to gen up a test to verify this.

--Ken



More information about the Idna-update mailing list