Normalization of Hangul

Harald Alvestrand harald at alvestrand.no
Wed Feb 20 09:04:07 CET 2008


Kenneth Whistler skrev:
> Patrik asked:
>
>   
>> Is there a different specification of the normalization algorithm for  
>> Hangul than what now exists, an algorithm specified that is based upon  
>> the fact one should know how integer arithmetic in Java works?
>>     
>
> Well, the specification of exactly how Hangul decomposition
> and (re)composition works is in Section 3.12 of the standard,
> pp. 121 - 123. That doesn't depend on integer arithmetic in Java.
> All you do is plug the decomposition and composition rules
> for Hangul into the relevant part of the UAX #15 normalization
> that requires decomposition or composition of strings.
>   
Ken,

for those of us who don't have the whole Unicode standard in their
brains at once:
The NFKC and NFC algorithms depend on:

- Decomposition as described in the standard section 3.5
  - UnicodeData.txt for its values
  - section 3.11 for the canonical reordering of combining marks
  - section 3.12 for decomposition of Hangul
- Composition as described in UAX#15 section 5
  - UnicodeData.txt for its values
  - CompositionExclusion.txt for exceptions
  - section 3.12 of the standard for Hangul composition

Is that the complete set of what one needs to read to implement NFKC and
NFC, or is there Yet Another Data File Or Algorithm we have overlooked?
> Normally, of course, you just depend on a library API that
> does the normalization for you (including the proper handling
> of Hangul).
Irrelevant for the purpose of writing a standard. Very useful for
testing it.

                Harald


More information about the Idna-update mailing list