Public Review Issue #181: Changing General Category of Twelve Characters

Ken Whistler kenw at sybase.com
Tue Apr 5 21:00:35 CEST 2011


Patrik,

I see that Mark responded on this thread, but didn't actually answer the 
question.

For IDNA 2008 purposes, the relevant point to look at is Section B of 
RFC 5892,
not Section A.

All twelve of these characters are superscript or subscript characters 
which have
compatibility decompositions to single letters. Because of this, they 
are all
"unstable" by the criterion in Section B. As a result they are all 
DISALLOWED
in IDNA 2008 (of whatever vintage) and will stay that way, because of the
Unicode normalization stability guarantees.

Changing their General Category values from gc=Ll to gc=Lm has no impact
whatsoever on the bottom line of whether these twelve characters are
allowed in IDN's. (They aren't.)

--Ken

On 4/4/2011 7:43 AM, Mark Davis ☕ wrote:
> That was one of the considerations in the discussion; the effect on 
> identifiers (IDNA and others).
>
> Mark
> //

> 2011/4/4 Patrik Fältström <patrik at frobbit.se <mailto:patrik at frobbit.se>>
>
>     I also would like to get a firm response from Unicode people as
>     well, BUT, by just quickly looking at the change, I can only see
>     the change gc=Ll to gc=Lm be something that have to do with IDNA2008.
>
>     And as rule A of IDNA2008 is the following:
>
>     A: General_Category(cp) is in {Ll, Lu, Lo, Nd, Lm, Mn, Mc}
>
>     ...i.e. both Ll and Lm are accepted, this change should NOT have
>     any impact on IDNA2008.
>
>     So I am not as worried as I was when I first saw that Gc was
>     proposed to be changed for twelve(!) characters!!!
>
>       Patrik
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20110405/6cd4b0f7/attachment.html>


More information about the Idna-update mailing list