UTC Agenda Item: IDNA proposal

Wed Nov 22 14:07:07 CET 2006

Class Mn contains the HEBREW POINT QAMATS that the -bidi draft is busy 
defending. Can't eliminate that.

            Harald

--On 22. november 2006 13:52 +0100 Patrik Fältström <patrik at frobbit.se> 
wrote:

> I have recreated the tables using a new algorithm (based on inputfrom
> Kenneth mostly).
>
> (1) Use the scripts.txt file for the script definitions, do not usethe
> blocks definitions
>
> (2) Remove codepoints where cp != NFKC(cp)
>
> (3) Remove codepoints where cp != lowercase(cp)
>
> (4) Remove codepoints where class(cp) != "Ll"
>
> (5) Include codepoints that are part of US-ASCII (0-9, A-Z and a-z)
>
> The result of doing this for U+0000 - U+FFFF can be found as
>
> http://stupid.domain.name/idnabis/table-ll.html
>
> If I instead instep 4 accept things of class both Ll and Lo, then
> theresult can be found as
>
> http://stupid.domain.name/idnabis/table-lllo.html
>
> Please let me know what you think.
>
> I have this comment regarding one entry from class Lm:
>
>>>  | Exclude  | U+02BB | U+02BB | Lm    | MODIFIER LETTER TURNED
>>> COMMA |
>>>  | Exclude  | U+02BC | U+02BC | Lm    | MODIFIER LETTER
>>> APOSTROPHE   |
>>>
>>
>> As ASCII isn't directly encodable using Punycode, one of these is
>> going
>> to be needed to be allowed for Pacific languages, which use the
>> apostrophe. eg, Hawaiʻi. It is often ignored, but in languages like
>> Tongan it can make a difference.
>
> I have not taken this into account when creating these tables.
>
>      Regards, Patrik
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>