Preparation for IDNABIS Stockholm

Mark Davis ⌛ mark at macchiato.com
Mon Jul 20 09:02:02 CEST 2009


The TATWEEL was missing from the list below. I put a full list, plus a
sample of each character, plus links to property information, at

http://www.macchiato.com/unicode/idna/exceptions

I also tried to put my guess as to the motivation for all the CONTEXTO
characters. Any information on that would be appreciated, since the
rationales are not clear from email.

[I have some personal comments also, clearly marked as such. My conclusion
is that we need none of the CONTEXTO characters currently in Exceptions;
that all of them can just be PVALID.

The only characters that really need special handling are HYPHEN (which is
already called out specially in Protocol, and can be dealt with there) and
the ARABIC-INDIC digits, which can be dealt with in Bidi). And of course, we
do need CONTEXTJ and the Exceptions for PVALID and DISALLOWED.]

Mark


On Sat, Jul 18, 2009 at 12:02, Vint Cerf <vint at google.com> wrote:

> PVALID: // would otherwise have been DISALLOWED
>
>
>
>  00DF; PVALID     # LATIN SMALL LETTER SHARP S
>
>  03C2; PVALID     # GREEK SMALL LETTER FINAL SIGMA
>
>  06FD; PVALID     # ARABIC SIGN SINDHI AMPERSAND
>
>  06FE; PVALID     # ARABIC SIGN SINDHI POSTPOSITION MEN
>
>  0F0B; PVALID     # TIBETAN MARK INTERSYLLABIC TSHEG
>
>  3007; PVALID     # IDEOGRAPHIC NUMBER ZERO
>
>
>
> CONTEXTO: // would otherwise have been DISALLOWED
>
>
>
>  00B7; CONTEXTO   # MIDDLE DOT
>
>  0375; CONTEXTO   # GREEK LOWER NUMERAL SIGN (KERAIA)
>
>  05F3; CONTEXTO   # HEBREW PUNCTUATION GERESH
>
>  05F4; CONTEXTO   # HEBREW PUNCTUATION GERSHAYIM
>
>  30FB; CONTEXTO   # KATAKANA MIDDLE DOT
>
>
>
> CONTEXTO: // would otherwise have been PVALID
>
>
>
>  U+002D; CONTEXTO   # HYPHEN-MINUS
>
>  U+02B9; CONTEXTO   # MODIFIER LETTER PRIME
>
>  U+0660; CONTEXTO   # ARABIC-INDIC DIGIT ZERO
>
>  U+0661; CONTEXTO   # ARABIC-INDIC DIGIT ONE
>
>  U+0662; CONTEXTO   # ARABIC-INDIC DIGIT TWO
>
>  U+0663; CONTEXTO   # ARABIC-INDIC DIGIT THREE
>
>  U+0664; CONTEXTO   # ARABIC-INDIC DIGIT FOUR
>
>  U+0665; CONTEXTO   # ARABIC-INDIC DIGIT FIVE
>
>  U+0666; CONTEXTO   # ARABIC-INDIC DIGIT SIX
>
>  U+0667; CONTEXTO   # ARABIC-INDIC DIGIT SEVEN
>
>  U+0668; CONTEXTO   # ARABIC-INDIC DIGIT EIGHT
>
>  U+0669; CONTEXTO   # ARABIC-INDIC DIGIT NINE
>
>  U+06F0; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT ZERO
>
>  U+06F1; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT ONE
>
>  U+06F2; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT TWO
>
>  U+06F3; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT THREE
>
>  U+06F4; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT FOUR
>
>  U+06F5; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT FIVE
>
>  U+06F6; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT SIX
>
>  U+06F7; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT SEVEN
>
>  U+06F8; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT EIGHT
>
>  U+06F9; CONTEXTO   # EXTENDED ARABIC-INDIC DIGIT NINE
>
>  U+0483; CONTEXTO   # COMBINING CYRILLIC TITLO
>
>  U+3005; CONTEXTO   # IDEOGRAPHIC ITERATION MARK
>
>
>
> DISALLOWED: // would otherwise have been PVALID
>
>
>
>  U+302E; DISALLOWED # HANGUL SINGLE DOT TONE MARK
>
>  U+302F; DISALLOWED # HANGUL DOUBLE DOT TONE MARK
>
>
>
> In addition it has been proposed to DISALLOW the following vertical
> formatting characters:
>
>
>
> U+3031: Lm: VERTICAL KANA REPEAT MARK
>
> U+3032: Lm: VERTICAL KANA REPEAT WITH VOICED SOUND MARK
>
> U+3033: Lm: VERTICAL KANA REPEAT MARK UPPER HALF
>
> U+3034: Lm: VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF
>
> U+3035: Lm: VERTICAL KANA REPEAT MARK LOWER HALF
>
> U+303B: Lm: VERTICAL IDEOGRAPHIC ITERATION MARK
>
> U+07FA: Lm:  NKO LAJANYALAN
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090720/c2057195/attachment-0001.htm 


More information about the Idna-update mailing list