UTC Agenda Item: IDNA proposal
patrik at frobbit.se
Wed Nov 22 14:28:12 CET 2006
Version that accept classes Ll, Lo and Mn can be found as
What about class Nd?
On 22 nov 2006, at 14.07, Harald Alvestrand wrote:
> Class Mn contains the HEBREW POINT QAMATS that the -bidi draft is
> busy defending. Can't eliminate that.
> --On 22. november 2006 13:52 +0100 Patrik Fältström
> <patrik at frobbit.se> wrote:
>> I have recreated the tables using a new algorithm (based on inputfrom
>> Kenneth mostly).
>> (1) Use the scripts.txt file for the script definitions, do not
>> blocks definitions
>> (2) Remove codepoints where cp != NFKC(cp)
>> (3) Remove codepoints where cp != lowercase(cp)
>> (4) Remove codepoints where class(cp) != "Ll"
>> (5) Include codepoints that are part of US-ASCII (0-9, A-Z and a-z)
>> The result of doing this for U+0000 - U+FFFF can be found as
>> If I instead instep 4 accept things of class both Ll and Lo, then
>> theresult can be found as
>> Please let me know what you think.
>> I have this comment regarding one entry from class Lm:
>>>> | Exclude | U+02BB | U+02BB | Lm | MODIFIER LETTER TURNED
>>>> COMMA |
>>>> | Exclude | U+02BC | U+02BC | Lm | MODIFIER LETTER
>>>> APOSTROPHE |
>>> As ASCII isn't directly encodable using Punycode, one of these is
>>> to be needed to be allowed for Pacific languages, which use the
>>> apostrophe. eg, Hawaiʻi. It is often ignored, but in languages like
>>> Tongan it can make a difference.
>> I have not taken this into account when creating these tables.
>> Regards, Patrik
>> Idna-update mailing list
>> Idna-update at alvestrand.no
More information about the Idna-update