Changes in tables from Unicode 5.0 to 5.1

Kenneth Whistler kenw at
Wed Mar 19 19:50:00 CET 2008

> At 7:15 AM +0100 3/19/08, Patrik Fältström wrote:
> >I have checked what changes we will get when we go from Unicode 5.0 
> >to 5.1 (given version of 5.1 that existed last Friday on the Unicode 
> >Site):
> >
> >1. There is one codepoint that is DISALLOWED in 5.0 and PVALID in 5.1
> >
> >In 5.0:
> >
> >In 5.1:
> >
> >Reason for this is that it changes from being GeneralCategory Sk to 
> >General Category Lm. This in turn make the codepoint from not being 
> >in any of the categories in IDNA200X to be Category A.
> That seems of some concern. This seems to be a character that we 
> would not want in IDNA200x.

Why not? Base the RFC on Unicode 5.1, as we have been suggesting,
and the issue goes away.

> Can people who understand this character 
> comment on it? Like, why was the category changed?


The modifier letters U+02C6..U+02CF have long been gc=Lm
(and hence included in identifiers), as a result of their
known use for tone marks in various orthographies of East
and Southeast Asia and of Africa.

In 2006, Lorna Priest of SIL submitted a proposal to encode
its use in orthographies for Akha and Lahu (languages used
in Southeast Asia). 

But in the context of that document, she also demonstrated
that the existing modifier letter U+02EC MODIFIER LETTER VOICING
was also part of these orthographies. And the change
to gc=Lm for that character was to make its use and
treatment in identifiers consistent with U+02C6..U+02CF.
The reason it wasn't originally designated gc=Lm, but instead
as gc=Sk was that its primary source was as an IPA diacritic
for voicing. Such IPA diacritics aren't normally parts of
language orthographies, unlike the tone marks, and so they
get gc=Sk. The discovery of the use of the same character
as part of a significant language orthography pushed the
case to the other side, and the general category was changed
to gc=Lm for Unicode 5.1.


> >This codepoint because of this would be forced to be added to 
> >category G IF this draft had been posted as an RFC:
> >
> >    Category G - Backward compatibility
> Glad we have that, yes.

More information about the Idna-update mailing list