Consensus Call Tranche 8 (Character Adjustments)

Kenneth Whistler kenw at sybase.com
Wed Oct 15 23:18:09 CEST 2008


John Klensin wrote:

> ... but, if the [pre]composed characters are consistently
> formed by NFC,

They are. (That is the 11,172 Wanseong Hangul Syllable Blocks
encoded at U+AC00..U+D7A3 behave that way.)

> I agree that consistency with decisions made
> elsewhere would disallow the problematic comparison cases and
> dictate that we leave this to registry restrictions.

Correct, IMO.

> 
> I am a bit concerned about the hypothetical case that Martin
> raised and my reaction, at least if I correctly understand
> Unicode's stability rules.    If a few syllables that are now
> considered archaic (or, if such cases exists, ones that have
> never been used) abruptly become, to use Martin's term, of
> crucial importance, would the syllable forms  be allocated code
> points?

No.

> If so, am I correct in assuming that stability rules
> would require that NFC would actually decompose the newly-added
> syllables (presumably composing the individual Jamo to the new
> syllables would result in an incompatible change to
> normalization)?

Counterfactual. But yes, if it *were* the case (which it isn't),
then addition of a new precomposed Hangul syllable would then
require addressing normalization stability. The exact details
of how that would be done are unclear, because any new
precomposed Hangul syllable would, by definition, be outside
the context of the Hangul Syllable Composition and Hangul
Syllable Decomposition algorithms (TUS 5.0, pp. 122 - 124)
which *define* the normalization relationship between
conjoining jamos and the 11,172 precomposed Hangul syllables.

> That isn't an attractive answer because it
> makes the behavior dependent on when a particular character code
> point is added to Unicode.

Also counterfactual, because such characters will not be added
to the Unicode Standard. Nobody in the UTC *or* in Korea
(South or North) is asking for them.

In fact, if you read the new Korean standard, KS X 1026-1:2007,
"Part 1, Hangul processing guide for information interchange",
that standard *mandates* that for Old Hangul syllable blocks
a sequence of three Jamos be used:

"5.2 A representation format of Modern Hangul syllable blocks

"For representing Modern Hangul syllable blocks, we must use code
positions of 11,172 Hangul syllables U+AC00 ~ U+D7A3. ...

"5.3 A representation format of Old Hangul syllable blocks

"For representing Old Hangul syllable blocks, we must use
code positions of Johab Hangul letters in Hangul Jamo U+1100 ~ 
U+11FF, Hangul Jamo Extended-A U+A960 ~ U+A97F, and Hangul Jamo
Extended-B U+D7B0 ~ U+D7FF, ..."

That isn't something that the UTC wrote in the Unicode Standard --
it is what the Korean Agency for Technology & Standards wrote
in a *Korean* standard.

> However, I note that
> prohibiting the Jamo in IDNA would prevent the problem, at the
> cost of requiring anyone who wants to use a syllable that is not
> now assigned a code point in a domain name to persuade UTC and
> SC2 to add that  code point.

Which will never happen.

See above. Prohibiting the Jamos in IDNA would prevent the
usage of Old Hangul syllable blocks in domain names, period.
And frankly, I consider that well within the purview of
registry policies in Korea, if that is what they want to do.

> Unless the national experts and registry can make a much
> stronger case than I can make on their behalf (that ought to be
> easy for them, but they have not yet been heard from), I think
> the NFC relationships still shift the balance toward making this
> a registry restriction. 

Correct.

> However, I don't think the answer is
> quite as obvious and one-sided as your note seems to imply.

Noted. But I disagree and consider this one obvious. I will
return to Andrew Sullivan's point. If you think it is within
the competence and purview of this particular working group
to decide that the *protocol* should prohibit a certain
subset of historical Old Hangul syllables from representation
in domain names, then we may as well reopen the discussion
about the appropriateness of the protocol letting Sumero-Akkadian
cuneiform, Linear B syllables, or other historic scripts in
domain names.

--Ken



More information about the Idna-update mailing list