Question about idnabis-table-04

Kenneth Whistler kenw at sybase.com
Tue Dec 2 22:02:24 CET 2008


Guonian,

12 of the CJK ideographs in the range U+FA0E..U+FA29 are
actually part of the set of *unified* ideographs in
the standard, the main block of which is in the range
U+4E00..U+9FFF.

All of the unified ideographs are PVALID. None of them have
canonical decompositions in the standard, and thus they are
all stable under NFKC normalization.

All of the *rest* of the compatibility CJK ideographs in
the range U+F900..U+FAD9 do have canonical decompositions,
are not stable under NFKC normalization, and are thus
DISALLOWED in the table.

The names and range for these compatibility ideographs
are not the criterial considerations here -- it is the
normalization status which matters. This often leads people
astray, as it is so easy to assume that every character
named "CJK COMPATIBILITY IDEOGRAPH-XXXX" has the same
status.

See the explanation on p. 424 of The Unicode Standard,
Version 5.0. Online at:

http://www.unicode.org/versions/Unicode5.0.0/ch12.pdf

--Ken

> from draft-ietf-idnabis-tables-04.txt, I could see these characters are 
divided
> in two categories, PVALID and DISALLOWED.
> 
> FA0E..FA0F  ; PVALID      # CJK COMPATIBILITY IDEOGRAPH-FA0E..CJK COMPAT
> FA10        ; DISALLOWED  # CJK COMPATIBILITY IDEOGRAPH-FA10
> FA11        ; PVALID      # CJK COMPATIBILITY IDEOGRAPH-FA11
> FA12        ; DISALLOWED  # CJK COMPATIBILITY IDEOGRAPH-FA12
> FA13..FA14  ; PVALID      # CJK COMPATIBILITY IDEOGRAPH-FA13..CJK COMPAT
> FA15..FA1E  ; DISALLOWED  # CJK COMPATIBILITY IDEOGRAPH-FA15..CJK COMPAT
> FA1F        ; PVALID      # CJK COMPATIBILITY IDEOGRAPH-FA1F
> FA20        ; DISALLOWED  # CJK COMPATIBILITY IDEOGRAPH-FA20
> FA21        ; PVALID      # CJK COMPATIBILITY IDEOGRAPH-FA21
> FA22        ; DISALLOWED  # CJK COMPATIBILITY IDEOGRAPH-FA22
> FA23..FA24  ; PVALID      # CJK COMPATIBILITY IDEOGRAPH-FA23..CJK COMPAT
> FA25..FA26  ; DISALLOWED  # CJK COMPATIBILITY IDEOGRAPH-FA25..CJK COMPAT
> FA27..FA29  ; PVALID      # CJK COMPATIBILITY IDEOGRAPH-FA27..CJK COMPAT
> FA2A..FA2D  ; DISALLOWED  # CJK COMPATIBILITY IDEOGRAPH-FA2A..CJK COMPAT
> FA30..FA6A  ; DISALLOWED  # CJK COMPATIBILITY IDEOGRAPH-FA30..CJK COMPAT
> FA70..FAD9  ; DISALLOWED  # CJK COMPATIBILITY IDEOGRAPH-FA70..CJK COMPAT
> 
> What rule makes these characters U+FA10, U+FA12, ... DISALLOWED ? I am not 
sure
> if they fallen under DISALLOWED because U+FA10 == U+585A and U+FA12 == U+6674,
> ... respectively.
> 
> Maybe I missed some documents about this. Could you give me more hints ?
> Thanks !
> 
> Regards,
> 
> Guonian SUN



More information about the Idna-update mailing list