Normalization of Hangul

Mark Davis mark.davis at icu-project.org
Wed Feb 20 16:16:54 CET 2008


Yes, those sections are what is required. All told:

   - NFC and NKFC are defined in Section 5 of UAX#15 (
   http://unicode.org/reports/tr15/#Specification), which:
   - references Canonical Decomposition
      - defines the composition processes
      - Canonical Decomposition is defined on p 96 of U5.0
      - *D68 Canonical decomposition: The decomposition of a character
      that results from recursively applying the canonical mappings
found in the
      Unicode Character Database and those described in Section 3.12,
      Conjoining Jamo Behavior, until no characters can be further
decomposed, and
      then reordering nonspacing marks according to Section 3.11,
      Canonical Ordering Behavior.*
   - Data
   - UnicodeData.txt
      - CompositionExclusion.txt

Mark

Note: There is still final editorial work being done on
http://www.unicode.org/reports/tr15/tr15-28.html, so if you have any
suggestions for editorial clarifications, now's the time!



On Feb 20, 2008 12:04 AM, Harald Alvestrand <harald at alvestrand.no> wrote:

> Kenneth Whistler skrev:
> > Patrik asked:
> >
> >
> >> Is there a different specification of the normalization algorithm for
> >> Hangul than what now exists, an algorithm specified that is based upon
> >> the fact one should know how integer arithmetic in Java works?
> >>
> >
> > Well, the specification of exactly how Hangul decomposition
> > and (re)composition works is in Section 3.12 of the standard,
> > pp. 121 - 123. That doesn't depend on integer arithmetic in Java.
> > All you do is plug the decomposition and composition rules
> > for Hangul into the relevant part of the UAX #15 normalization
> > that requires decomposition or composition of strings.
> >
> Ken,
>
> for those of us who don't have the whole Unicode standard in their
> brains at once:
> The NFKC and NFC algorithms depend on:
>
> - Decomposition as described in the standard section 3.5
>  - UnicodeData.txt for its values
>  - section 3.11 for the canonical reordering of combining marks
>  - section 3.12 for decomposition of Hangul
> - Composition as described in UAX#15 section 5
>  - UnicodeData.txt for its values
>  - CompositionExclusion.txt for exceptions
>  - section 3.12 of the standard for Hangul composition
>
> Is that the complete set of what one needs to read to implement NFKC and
> NFC, or is there Yet Another Data File Or Algorithm we have overlooked?
> > Normally, of course, you just depend on a library API that
> > does the normalization for you (including the proper handling
> > of Hangul).
> Irrelevant for the purpose of writing a standard. Very useful for
> testing it.
>
>                Harald
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080220/b7aff891/attachment.html


More information about the Idna-update mailing list