Combining accents

Sam Vilain sam.vilain at catalyst.net.nz
Mon Nov 27 01:18:50 CET 2006


John C Klensin wrote:
>> But doesn't è decompose to a sequence including that mark?
>>     
> I may miss your point but, if I don't, that is one of the
> reasons we have used NFKC, rather than NFKD, all along.
>   

Oh, right :-}.  Funny how little details like that can be missed.  I
thought it happened the other way around.

This is a bit of a problem.  The Indic scripts must be able to use their
combining marks/vowel signs; they don't have a rich enough set of
pre-composed characters to write their language.  And if romanised forms
of African languages need compositions which are not already there, then
they will never work.

This might need to wait for the next version, but it should be possible
to permit combining characters without breaking backwards compatibility
or losing the intent of this specification, you'd need to:

1. be able to classify combining marks with their target scripts, to
make sure that you're not trying to combine a Latin diacritical mark
with a Chinese ideograph (etc)

2. disallow combining marks except in places where they're expected

3. standardise on the NKFD form, except for where a pre-composed form
exists.

It's ugly, but any tidier suggestions that don't exclude >25% of the
world's population?  :)

-- 
Sam Vilain, Systems Architect, Catalyst IT (NZ) Ltd.
phone: +64 4 499 2267        PGP ID: 0x66B25843



More information about the Idna-update mailing list