KATS (Korean Agency for Technology and Standards)'s Comments on the Unicode Codepoints and IDNA Internet-Draft

Patrik Fältström patrik at frobbit.se
Mon Nov 3 14:27:38 CET 2008


On 2 nov 2008, at 12.58, Kent Karlsson wrote:

>   D: block(cp) in {Combining Diacritical Marks for Symbols,
>                    Musical Symbols, Ancient Greek Musical Notation,
>                    Private Use Area}
>
> 1) I *don't* think the Hangul blocks (plural!) belong there. There  
> will
>   be new blocks for Hangul Jamo, IIUC for 5.2:
> 	A960-A97F; Hangul Jamo Extended-A
> 	D7B0-D7FF; Hangul Jamo Extended-B

I am now for the next version of the document NOT choosing this path.

> 2) You have included "E000..F8FF; Private Use Area" in your set above,
>   but not
> 	F0000..FFFFF; Supplementary Private Use Area-A
> 	100000..10FFFF; Supplementary Private Use Area-B
>   Why is that? Is "Private Use Area" in the set supposed to cover all
>   three blocks with that as a *sub*string of the name? I would suggest
>   not to use a substring approach here.

That is a bug in the draft. In reality Private Use Area (and some  
others I am sure) will be DISALLOWED because they are assigned  
codepoints but does not belong to any rule that result in a PVALID  
result.

> 3) I think
> 	FE00..FE0F; Variation Selectors

FE00..FE19  ; DISALLOWED  # VARIATION SELECTOR-1..PRESENTATION FORM FOR

> 	E0000..E007F; Tags
> 	E0100..E01EF; Variation Selectors Supplement

>   should be in the set of IgnorableBlocks as well (though these are
>   also covered by the IgnorableProperties (C) rule, as well as
>   not being  2.1.1.  LetterDigits (A)). Also

E0000       ; UNASSIGNED  # <reserved>
E0001       ; DISALLOWED  # LANGUAGE TAG
E0002..E001F; UNASSIGNED  # <reserved>..<reserved>
E0020..E007F; DISALLOWED  # TAG SPACE..CANCEL TAG
E0080..E00FF; UNASSIGNED  # <reserved>..<reserved>
E0100..E01EF; DISALLOWED  # VARIATION SELECTOR-17..VARIATION SELECTOR-25
E01F0..EFFFD; UNASSIGNED  # <reserved>..<reserved>
EFFFE..10FFFE; DISALLOWED # <noncharacter>..<noncharacter>

> 	D800..DB7F; High Surrogates
> 	DB80..DBFF; High Private Use Surrogates
> 	DC00..DFFF; Low Surrogates
>   belong in that set (even though these are excluded by not being
>   2.1.1.  LetterDigits (A)).

D800..FA0D  ; DISALLOWED  # <Non Private Use High Surrogate>..CJK COMPAT

> Alternatively, include just "Combining Diacritical Marks for Symbols"
> in IgnorableBlocks, since all the other things there are excluded
> anyway by other rules.
>
>> Secondly, we can use the Hangul_Syllable_Type as defined in
>> HangulSyllableType.txt.
>
> I think that would be preferable, since that definition need not
> be changed when new Hangul Jamo blocks are added (which is not
> entirely unlikely even after 5.2).

I am now adding a rule that DISALLOW codepoints with Hangul Syllable  
Type is one of L, V or T.

Is that correct understanding of the situation?

>> What I have not found any Unicode definition of, so it has to be
>> exceptions. are for the Bangjeom, 302A..302F. Help would be
>> appreciated.
>
> They have compatibility decompositions, and so are excluded already
> by the rule in  2.1.2.  Unstable (B). I think that is sufficient.

Hmm...they do not match Unstable in my program. And because of that I  
have to add them as exceptions.

Can you please check again?

     Patrik



More information about the Idna-update mailing list