Hangul jamo issues

John C Klensin klensin at jck.com
Tue Jan 2 13:28:40 CET 2007



--On Tuesday, 02 January, 2007 13:27 +0900 Soobok Lee
<lsb at lsb.org> wrote:

> 
> This is the issue list for Hangul Jamos:
> 
> Hangul Jamo ( in Range: 1100-11FF)
>   These  should be available as input and allowed in labels to
> make   jamo-only sequences of labels and archaic hangul
> syllables of labels.   We already have registrations using
> this characters in IDN.com.  

For a number of reasons, and with no disrespect to registrations
in COM, what is permitted under the JET tables and associated
rules (assuming they exist) for .KR?   Looking at your slightly
later note, I find it interesting that neither we nor, as far as
I know, ICANN have heard any requests for changes in this area
from NIDA, despite a great many comments from NIDA or the
government on other IDN-related areas.

To repeat what has been said in other areas, the fact that a
sequence is legitimate in some present or past use of the
language, or that it would be comprehensible if used in a name,
does not imply a "right" to have it included in the DNS.  We
should be careful about excluding it.   But we should also not
assume that, because it is possible and sources of conflicts
cannot easily be identified, permitting it is a good idea.

> Hangul Compatibility Jamo ( in Range: 3130-318F)
>   These  should be available as input and be mapped
>   into Hangul Jamo Range 1100-11FF by IDNAbis preprocessing
> stage in applications.

I believe we need to assume that every instance in which 
   ToUnicode(ToASCII(label)) != label
is trouble waiting to happen.  Requiring the character mapping
that causes it to occur as part of the standard should be
avoided unless there is a compelling reason (case-mapping for
consistency with ASCII label behavior is, for me, one such
compelling reason).  If the relationship is an artifact of
Unicode (or other CCS) decisions about whether or not
conventional characters should be assigned separate code points,
then I think that any mappings should lie outside the standard
and in UIs, partially to help make it clear that IDNA-canonical
forms, and only those forms, should be used in interchange and
on the wire.  

For a user typing an IDN or IRI into an application, there is no
difference between something that is done in a UI and something
required as part of the standard.  However, the former would
become invalid as part of IRIs to be transmitted across the wire
or embedded in a message to others.

>   Ordinary Korean users can type in only these Compatibility
>   Jamos and cannot type directly those in 1100-11FF (in
> Windows).   NFKC does this mapping( and composing), but NFC
> does not.   3164 === U+1160 : compatibility equivalence for
> hangul filler   3131 === U+1100 : compatibility equivalence
> for initial KI-EOK   and so on.

In general, any time one starts talking about what users can
type, one is out at the UI level of abstraction.   In other
words, what people can type and, more important, what shows up
at the interface to an application after they type it, is the
consequence of user interface and operating system design
issues, not something that should be compelling as part of
application protocol design. 

>   Need of jamo sequences in inputs:
>    KSC5601 has only standard 2350 hangul syllables, while its
> Window-specific     extension (CP949) has full set of 11172
> hangul syllables.     Microsoft added those thousands of
> characters to serve korean users' needs,     especially from
> teenagers and scholars.
>    So, in linux x-terminal, for example, we cannot type
> directly these     extended syllables, but can type in only
> compat. jamo sequences.    And, some code conversion
> tools(cp949 -> ksc5601) may transform       extended hangul
> syllables into compat. jamo sequences.    If these compat.
> jamo sequences are mapped into jamo sequeces(u+11xx)     by
> preprocessing stage in IDNAbis,
>     NFC in IDNAbis would further combine these sequences into 
>     composed hangul syllables.

Hmm.  I look at that explanation and it seems to me to be a
strong reason to ban these _in the protocol_: mapping them in
and out of IDNA/punycode form is going to yield characters
different from what the user typed in and, in some cases,
perhaps characters that can't even be rendered.   Perhaps I
don't understand.

> Hangul Half-Width Jamo ( in Range: FFA0-FFDC)
>   Ordinary Korean users seldom type in these Jamos in Windows,
> AFAIK.   So the need of these characters in label inputs is
> questionable.   NFKC maps these characters into Hangul Jamo
> Range 1100-11FF.   But NFC does not.
>   FFA0 === 3164 === U+1160 : compatibility equivalence for
> hangul filler   FFA1 === 3131 === U+1100 : compatibility
> equivalence for initial KI-EOK   and so on.
> 
> U+3164, U+1160, U+FFA0 Hangul Filler:
>  U+3164, U+1160 are displayed as blank space 
>   in Windows.
>  U+FFA0  Half-width Hangul Filler is displayed
>   as bold-faced middle dot in Windows.
>  Need cautions in displaying these characters.

No, need to prohibit them entirely, under the "no spaces and no
punctuation" principle, as too risky unless there is compelling
reason for their inclusion.

> Both initial consonant U+1100 and 
>  its final consonant correspondent U+11A8  
>  are displayed in the exactly same glyph and margin in Windows.
>  And so forth for other consonants.
>   Need cautions in registering and displaying these characters.

regards,
   john



More information about the Idna-update mailing list