Combining accents

Mark Davis markdavis at
Mon Nov 27 17:11:22 CET 2006

Some clarification.

1. It appears that you may think that NFKC does not forbid combining marks;
however, it only forbids sequences that could be expressed with a combined
form (with a few exceptions). Thus:

A + acute is forbidden in NFKC
X + cedilla is not forbidden in NFKC


2. Unicode composition and decomposition is not based on visual
confusability (referring to your memo on Gujarati). For example, "m" does
not decompose to "rn" even though those two sequences are visually
confusable (at address box sizes in common fonts they look the same). Nor is
it simply based on origin: "w" does not decompose to "vv". For more
examples, see

Visual similarity is much broader than the Unicode composition and
decomposition. See

Baking visual similarity into the protocol would be a real problem for many,
many languages: it would be the equivalent of disallowing the use of the
letter "m" in English.


On 11/26/06, Sam Vilain <sam.vilain at> wrote:
> John C Klensin wrote:
> >> But doesn't è decompose to a sequence including that mark?
> >>
> > I may miss your point but, if I don't, that is one of the
> > reasons we have used NFKC, rather than NFKD, all along.
> >
> Oh, right :-}.  Funny how little details like that can be missed.  I
> thought it happened the other way around.
> This is a bit of a problem.  The Indic scripts must be able to use their
> combining marks/vowel signs; they don't have a rich enough set of
> pre-composed characters to write their language.  And if romanised forms
> of African languages need compositions which are not already there, then
> they will never work.
> This might need to wait for the next version, but it should be possible
> to permit combining characters without breaking backwards compatibility
> or losing the intent of this specification, you'd need to:
> 1. be able to classify combining marks with their target scripts, to
> make sure that you're not trying to combine a Latin diacritical mark
> with a Chinese ideograph (etc)
> 2. disallow combining marks except in places where they're expected
> 3. standardise on the NKFD form, except for where a pre-composed form
> exists.
> It's ugly, but any tidier suggestions that don't exclude >25% of the
> world's population?  :)
> --
> Sam Vilain, Systems Architect, Catalyst IT (NZ) Ltd.
> phone: +64 4 499 2267        PGP ID: 0x66B25843
> _______________________________________________
> Idna-update mailing list
> Idna-update at

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Idna-update mailing list