Hangul jamo issues

Wed Jan 3 04:20:09 CET 2007

On Tue, Jan 02, 2007 at 02:07:22PM -0500, John C Klensin wrote:
> > 
> > For example, NIDA did not include any CJK code points in that
> > korean  table.  No need for CJK characters in Korean language
> > at all?  No way! its inclusion was just postponed to avoid
> > TC/SC unification  issue. I was present at the NIDA WG for IDN.
> 
> First of all and to emphasize the point I was trying to make,
> "no need for CJK in Korean IDNs" would not imply, even if it
> were true and permanent, "no need for CJK in writing Korean
> language".  The questions are separate and, without predicting
> what NIDA will decide to do, if substantially every name that
> exists in Hangul also exists in Chinese-derived characters,

Not all hangul _business_ names have CJK form, while human names
have in Korea.

> a strong argument could be made for confusion-avoidance by
> prohibiting CJK registrations.  If one does not prohibit them,
> then one might well want a variant model that bound the CJK
> string for a given name together with the Hangul one, and I'd
> imagine that might be hard to implement.

No one can monopolize one hangul name and one CJK name together.

both (sooboklee in hangul).kr and soobok-lee.kr are available, then
we should prohibit (sooboklee in hangul) under .kr ? No.
For (sooboklee in CJK).kr and (sooboklee in hangul).kr we 
have the same argument. There are many persons sharing "soobok-lee"
in korea.

Demands for CJK names is much less than demands for Hangul names.
So, we need not worry about such situation of hot competitions for 
CJK form, since Hangul is more familiar and easier to type than CJK.

Japaneses have CJK and Hiragana form of names, but they prefer CJK
short form in DNS. So, they don't need prohibit Hiragana form, worring
about hot competitions for defensive regisrations.
That is the same with .kr.

(SamsungEletronics in CJK).kr may be registered for samsungeletronics
.kr 's IDN aliases for overseas cyber presence. 
We should not exclude it.

> 
> I have no way to know whether it is true, but it has been widely
> reported that CJK characters have been completely eliminated
> from the writing system for the Korean language in the North, so
> we might assume it is possible to do without them. 

That effort was in the early 1950~60s only in north korea. The leaders
in north korea once denounced CJK as "legacies of old feudal system",
and even they once stopped teaching CJK  in schools.

But,that changed.

Middle/highschool students began to learn CJK characters very hard.
North Korean national charset has thousands of CJK characters now.

As many english words have latin origin, many hangul words have CJK
origin. Those hangul words is just the pronuciation of CJK one.
That is why CJK cannot be separated from hangul words completely.
CJK is part of korean word and so language.

> 
> I understand that I'm oversimplifying a complex situation here
> and that I definitely don't understand all of the issues.   But
> I think these are precisely the types of decisions that we, and
> the relevant registries, need to make... and need to make
> conservatively and with an understanding of what is reasonably
> necessary for DNS naming, rather than assuming that "used in the
> usual writing system for the language" inherently equals "needed
> in IDNs and the DNS"

Hangul jamo sequences complement hangul syllables. They represent
something that modern composed hangul syllables can't. 
So they should not be excluded in DNS. 
Let registries do the selective filtering for 
a few problematic cases.

.
> 
> > Every korean new-born baby is given both hangul name and CJK
> > name by parents.  Korean domestic law enforces that CJK name
> > should be  registered for every new-born baby. Every adult
> > Korean has his/her  CJK name printed on his/her Residence ID
> > Card.
> 
> Ok.  From my perspective, and remembering the general philosophy
> of the JET work, this is a stronger argument for prohibiting CJK
> registrations in Korea, and certainly for prohibiting mixed
> Hangual-CJK strings, than it is an argument for requiring
> support for many or all combinations.

Even Japaneses name is given in CJK and Hiragana. That does not
justify prohibiting either CJK-only or Hiragana-only japanese name 
under .jp

In Japan, every Kanji word can be written in Hiragana reading, 
but Japanese people prefer Kanji (CJK) form since they carry more meaning
and give less confusion and shorter length. 

The same thing occurs in South Korea.
If you read south korean newspapers/textbooks, you often see parenthesized
(CJK)s to decorate preceding hangul words. There are cases like 
"multiple CJK/meaning for one hangul pronuciation" or
"One hangul human name has multiple CJK forms".

> 
> > Moreover, future Stringprep200x is not only for IDNAbis, but
> > also for other applications like SASL. We need more inclusive
> > Stringprep200x. 
> 
> Some parsimony in naming might benefit SASLprep (and other
> Stringprep profiles).  Some of the issues are the same and the
> same as (at least) the philosophy of the UTC "secure
> identifiers" concept: the ability to write a word or string in
> the relevant language does not make it a good identifier and
> reducing potential confusion in identifier matching is generally
> A Good Thing.  So I don't think that pointing out what is done
> in a writing system is, by itself, justification for arguing for
> more inclusion in Stringprep.   Second, while we have been
> assuming that IDNA200x and SASLprep200x will use essentially the
> same profile of Stringprep, that is not a hard requirement: if
> the needs are different and the differences are important (and
> we can explain why), then we might end up with different
> profiles.
> 
>  
> >> To repeat what has been said in other areas, the fact that a
> >> sequence is legitimate in some present or past use of the
> >> language, or that it would be comprehensible if used in a
> >> name, does not imply a "right" to have it included in the
> >> DNS.  We should be careful about excluding it.   But we
> >> should also not assume that, because it is possible and
> >> sources of conflicts cannot easily be identified, permitting
> >> it is a good idea.
> >
> > Hangul jamo sequences has been legitimate _by definition_ 
> > and by tradition _.  No room for debate!
> > 
> > Some confusible combinations of jamo sequences - as described
> > below - should be managed by registration policies.
> 
> This has never been the question.  The questions, at least as I
> have understood them, lie in whether, for example, one needs to
> accommodate both jamo and Hangul syllables in the DNS.   If the
> answer is "no", i.e., that it is possible to pick one or the
> other and stick with it, then the potential for confusion is
> reduced.  If it is necessary to permit both, and it is possible
> to write a given language string as either a sequence of
> syllable code points or as a sequence of Jamo code points, then
> I believe there is a matching problem that,>

NFC combines jamo sequences into modern hangul syllables. 
So you don't have to worry about that unification issue.
NFC does it! 

There are some jamo sequences that do not fit in modern 11172 hangul 
syllable and preserved unchanged through NFC. 
They are legitimate jamo sequences for archaic hangul syllables or 
pure consonant-only jamo sequences etc. They should be respected.

>  ideally, should be
> solved by normalization (i.e., in the protocol and at all levels
> of the DNS) and not by registration restrictions alone.   In my
> ignorance, I have understood that is not especially easy, so I
> am pressing on the question of whether we can pick one or the
> other.
> 
> > I see. Then why not include _selectively_ the compat mappings
> > of NFKC into future Stringprep200x ?
> 
> Simpler rules are better and lead to fewer problems.  So my
> answers to a question stated as "why not" is "why at all" and
> "is there really a compelling need to do this"?   In the
> particular case of IDNAbis, and remembering that any character
> that is mapped to another one is not represented in the DNS at
> all, I'd like all of us to understand what value accepting these
> compatibility characters and them mapping them away in the
> protocol adds to the IDN/DNS environment.
> 
> > The compatibility mappings of (NFKC - NFC) should be selectivly
> > included into Stringprep200x of IDNAbis under the criteria of
> > "the same glyph" rule. that is, if compatibility mappings 
> > produces the same glyph and same number of character for the
> > input  character, those mappings should be included into
> > Stringprep200x.  "Circled a" and ligatures  can be excluded.
> 
> The counter-argument --and I want to stress that this applies to
> many, many, scripts other than Hangul-- is that, if these
> characters are mapped away, then they cannot appear in output
> from the DNS.  Their assignment to separate code points is not
> intrinsic to the characters or glyphs, it is an artifact of how
> Unicode (and some other CCSs) are organized.  I'd like to
> believe that, had Unicode been organized strictly for DNS
> purposes, the additional code points would not be there at all
> (of course, it has to serve broader purposes, so their presence
> is presumably entirely appropriate).   So I can imagine "if you
> actually see these compatibility characters on input, map them"
> being very good advice for a UI-writer, or even for an Operating
> System input driver, but I remain convinced that we should keep
> the permitted inputs to IDNA itself as close to what IDNA can
> produce (i.e., to ToUnicode(ToASCII(string)) as possible.

Okay. Just allowing 0x31xx   will meet this requirement.
because 0x31xx are preserved through NFC.
x= (compat jamo sequence)
ToUnicode(ToASCII(x)) == x : this is always guaranteed.

0x31xx and 0x11xx share the same glyph, but such confusible
registrations can be prevented by registries. We already
have many sets of look-alikes in other scripts.

Again: 0x31xx is the UNIQUE input method available for jamo 
sequences. 0x11xx is crucial in archaic hangul syllables.
That is why they both shoud be allowed in input.

>   
> > Some compatibility mappings that don't cause glyph changes
> > like above  has the same importance as casefolding which
> > causes glyph changes.
> 
> I don't know how to evaluate "importance".   Case-folding in
> IDNs is a very specific compatibility issue with traditional DNS
> mapping rules and has little to do with Unicode or compatibility
> characters.
> 
> 
> The need for interoperability, and unique, unambiguous, global
> references in the DNS, strongly suggests that, if several
> different localized systems cannot figure out how something
> should be represented, we should see if we can do without it and
> prohibit it.

Prohibiting fillers are not so harmful. I won't oppose the prohibition.
But, "Let Registries/Applications' UI to filter out them" is  also
a good solution, since font problem may be cured later.

Again: jamo sequences does not make conflict with hangul syllables.
Jamo sequences have their own roles that can't be done with syllables.

Soobok