Jamo [RE: Consensus Call Tranche 8 (Character Adjustments)]

Fri Oct 17 03:48:48 CEST 2008

Hi John,

As I understand it, and I agree, it might not solve all the issues (as it stands, still thinking), but it does solve 2 types of issues:

1. combination of modern Jamos that do combine to a Hangul syllable, e.g.:

U+1109;U+1161;U+11BC  =>  U+C0C1

In this case, the use of <U+1109;U+1161;U+11BC> would effectively be disallowed.

2. combination of modern Jamos with old Jamos which combine to 1 Hangul syllable and 1 old Jamo, e.g.:

U+1109;U+1161;U+11F0  =>  U+C0AC;U+11F0

In this case, also, the use of <U+1109;U+1161;U+11F0> would be effectively disallowed.

It seems to me, if we are going to not disallow jamos, this would at least be a measure to avoid some of the most obvious problems in the context of IDN.

The cases where no combination happens under KC are the cases which would need further investigation.  It may be possible to add additional rules based on the algorithms for displaying Hangul characters....?...

Edmon

PS. True about casefold, just copy and pasted the argument.  But did think the general argument may be used elsewhere, in anycase, you are correct, it is not necessary.

> -----Original Message-----
> From: John C Klensin [mailto:klensin at jck.com]
> Sent: Friday, October 17, 2008 12:04 AM
> To: Edmon; idna-update at alvestrand.no
> Subject: Re: Jamo [RE: Consensus Call Tranche 8 (Character Adjustments)]
> 
> Edmon,
> 
> Perhaps I'm missing something (in part because I certainly know
> less about Korean than you do) but, unless there is something
> special going on with compatibility characters (in the Unicode
> NFKC sense), doesn't this proposal reduce to exactly what Mark
> and others have been arguing for (albeit at a much higher level
> of complexity).  In particular,
> 
> 	(i) Because Hangul does not have case distinctions,
> 	toCaseFolded() presumably does nothing.
> 
> 	(ii) In general, Unicode compatibility characters (i.e.,
> 	those that are changed into something else by NFKC but
> 	not by NFC) are DISALLOWED by the existing Tables
> 	document and model.  I can't find any exceptions to that
> 	principle in the Jamo range (or elsewhere) but, if they
> 	exist, we should presumably examine and deal with them
> 	in some appropriate way.
> 
> 	(iii) No character that is mapped into something else by
> 	NFC is permitted as part of a U-label.
> 
> If that analysis is correct, then CONTEXTO and your rule are not
> necessary because all of the restrictions they would imply are
> already enforced by the protocol and tables.
> 
>     john
> 
> 
> --On Thursday, 16 October, 2008 20:46 +0800 Edmon
> <mail at edmon.asia> wrote:
> 
> > I agree with the line of thought that we really should not
> > disregard the results of the consensus position established by
> > the most relevant language community after a rather extensive
> > consensus process, so in general, I would side with the
> > experts in Korea.
> >
> > Nevertheless, having been through this discussion for many
> > times, I understand that there are opinions otherwise and am
> > hoping to make a suggestion that could reconcile the lines of
> > thought and be consistent with our architecture.  When we last
> > discussed the issue of conjoining Hangul Jamo, I had suggested
> > exploring the possibility of addressing them in the following
> > manner:
> >
> > 1. categorize all Hangul Jamo as CONTEXTO
> >
> > 2. add stability contextual rule for these codepoints where
> > the following must be true:
> >
> > toNFKC(toCaseFolded(toNFKC(label))) != label
> >
> > I am not familiar enough with Korean, but this might strike a
> > graceful balance between disallowing conjoining jamo that
> > forms a modern hangul and continue to allow archaic Jamo
> > without creating too much of a confusion?...
> >
> > If I recall correctly, there was response that it seemed
> > interesting, but was not further discussed.  Do people think
> > it might be a viable approach to resolve the issue?
> >
> >
> >
> >
> >
> > Edmon
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From: idna-update-bounces at alvestrand.no
> > [mailto:idna-update-bounces at alvestrand.no] On Behalf Of Vint
> > Cerf Sent: Thursday, October 16, 2008 7:19 PM
> > To: Patrik Fältström
> > Cc: Martin Duerst; idna-update at alvestrand.no; Andrew Sullivan
> > Subject: Re: Consensus Call Tranche 8 (Character Adjustments)
> >
> >
> >
> > I am traveling in Oregon at the moment and will try to
> > summarize the state of responses tonight to the latest set of
> > consensus questions.
> >
> >
> >
> > I confess that I share Patrik's concern for disregarding a
> > consensus process from a specific language expert group.
> >
> >
> >
> > vint
> >
> >
> >
> >
> >
> > NOTE NEW BUSINESS ADDRESS AND PHONE
> >
> > Vint Cerf
> >
> > Google
> >
> > 1818 Library Street, Suite 400
> >
> > Reston, VA 20190
> >
> > 202-370-5637
> >
> > vint at google.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Oct 16, 2008, at 1:33 AM, Patrik Fältström wrote:
> >
> >
> >
> >
> >
> > On 16 okt 2008, at 06.08, Martin Duerst wrote:
> >
> >
> >
> > I had a look at the document again. For those points of the
> >
> > proposal where it disagrees with what we currently have,
> >
> > the words "not needed" are used. Nothing that even comes
> >
> > close to words such as "harmful", "confusing", or the like
> >
> > appears for points 1 and 2. The word "confusing appears for
> >
> > point 3, Hangul Compatibility Jamo, which we already disallow.
> >
> >
> >
> > Of course writing and reading such documents is always frought
> >
> > with difficulties, but I don't think that the hypothesis that
> >
> > the authors understand the difference between "we don't need
> >
> > them" and "these are dangerous" is far-fetched.
> >
> >
> >
> > Fair. Thanks Martin for taking time to read this document
> > again.
> >
> >
> >
> > This because I think having an IETF wg make a decision about
> > consensus
> >
> > that is _against_ a proposal from a formal organisation like
> > NIDA that
> >
> > say they have been running a consensus driven process in Korea
> > with
> >
> > participants from Korean Agency for Technology and Standards
> > (National
> >
> > Body of ISO and IEC), the National Institute of The Korean
> > Language,
> >
> > etc, is serious.
> >
> >
> >
> > If we had a similar situation in Sweden where IETF ruled
> > against what
> >
> > similar consensus driven process in Sweden about
> > Swedish...well, I
> >
> > would start asking serious questions on how consensus in IETF
> > was
> >
> > reached.
> >
> >
> >
> > So, I am as editor of the tables document neutral in the
> > issue. I just
> >
> > envision that for 8.c, we will get questions given what the
> > consensus
> >
> > seems to be at the moment.
> >
> >
> >
> >     Patrik
> >
> >
> >
> > Regards,    Martin.
> >
> >
> >
> > At 04:31 08/10/16, Patrik F舁tstr? wrote:
> >
> >
> >
> > On 15 okt 2008, at 20.36, Andrew Sullivan wrote:
> >
> >
> >
> > On Wed, Oct 15, 2008 at 08:20:05PM +0200, Patrik F舁tstr?
> > wrote:
> >
> >
> >
> > Understood. Note that if we look at the proposals eszett and
> > the
> >
> > one
> >
> > from korea, the eszett is an exception, while the korean
> > proposal
> >
> > uses
> >
> > the Unicode properties.
> >
> >
> >
> > Hmm.  If this is the case (and at least in the Korean
> > proposal, my
> >
> > notes make me think that we have a different meaning of
> > "Unicode
> >
> > properties" in the above, but I'm certainly not willing to
> > assert
> >
> > that
> >
> > I'm right), then I'm even more confused than I at first
> > thought I
> >
> > was.
> >
> > So I'm going to shut up about this topic, but I _still_ have
> > to say
> >
> > "no", since the consensus call said explictly that silence
> > would be
> >
> > counted as support (and I obviously can't support what I don't
> >
> > understand).
> >
> >
> >
> > I understand your statement, and view.
> >
> >
> >
> > I am just confused over the reaction in general from people.
> >
> >
> >
> > I have attached the Korean proposal, which in short is:
> >
> >
> >
> > 1. Add Hangul Jamo to blocks to disallow (i.e. "2.1.4
> > IgnorableBlocks
> >
> > (D)")
> >
> > 2. Add two codepoints (that is "Inherited", but not DISALLOWED
> > by
> >
> > other means) to DISALLOWED
> >
> >
> >
> > I.e. I must correct myself when I said that the proposal is
> > only
> >
> > using
> >
> > Unicode properties. I can not (but I am tired...) see how to
> > catch
> >
> > the
> >
> > two Bangjeom codepoints U+302E and U+302F without using
> > exceptions.
> >
> >
> >
> > People interested in this discussion should also re-read the
> > messages
> >
> > from Ken where he explain his view is that this is something
> > that
> >
> > should be expressed by a registry policy.
> >
> >
> >
> > Message-Id: <200807281913.m6SJDpL01810 at birdie.sybase.com>
> >
> > Date: Mon, 28 Jul 2008 12:13:51 -0700 (PDT)
> >
> >
> >
> >   Patrik
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> > Idna-update mailing list
> >
> > Idna-update at alvestrand.no
> >
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> >
> >
> >
> >
> >
> ># -#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin
> ># University
> >
> ># -#-#  http://www.sw.it.aoyama.ac.jp
> ># mailto:duerst at it.aoyama.ac.jp
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> > Idna-update mailing list
> >
> > Idna-update at alvestrand.no
> >
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> >
> >
> >
> > Checked by AVG - http://www.avg.com
> > Version: 8.0.173 / Virus Database: 270.8.0/1722 - Release
> > Date: 15-Oct-2008 8:02 PM
> >
> 
> 
> 
> 
> Checked by AVG - http://www.avg.com
> Version: 8.0.173 / Virus Database: 270.8.0/1722 - Release Date: 15-Oct-2008 8:02
> PM