Disallowing code points

Mark Davis ⌛ mark at macchiato.com
Fri Jul 17 20:53:16 CEST 2009


Mark


On Fri, Jul 17, 2009 at 11:15, Gervase Markham <gerv at mozilla.org> wrote:

> On 17/07/09 07:37, Gervase Markham wrote:
> > A good question, and one which I unfortunately do not have time
> > currently to answer. The list is here:
> >
> http://mxr.mozilla.org/mozilla-central/source/modules/libpref/src/init/all.js#762
> > if anyone else wants to decode it and discover.
>
> OK, using
> http://macchiato.com/idna/idna-info.html
> I get the below results. Headlines: five are PVALID
> (\u01C3\u02D0\u0337\u0338\u3033) and one is CONTEXT0 (\u05F4).
>
> PVALID:
> \u01C3 LATIN LETTER RETROFLEX CLICK (exclamation mark)
> \u02D0 MODIFIER LETTER TRIANGULAR COLON (colon)
> \u0337 COMBINING SHORT SOLIDUS OVERLAY (slash)
> \u0338 COMBINING LONG SOLIDUS OVERLAY (slash)
> \u3033 VERTICAL KANA REPEAT MARK UPPER HALF (slash)



Here is the set:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[\u01C3\u02D0\u0337\u0338\u3033]<http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%5Cu01C3%5Cu02D0%5Cu0337%5Cu0338%5Cu3033%5D>

My take is that all of these are legitimate characters, and should just be
PVALID.

  * Item* * Example* * Comments* \u01C3aǃb vs a!btypically identical, but !
isn't allowed in domain names anyway.
\u02D0aːb vs a:bsimilar, but not the same appearance. Could be confused with
: used in URL password or port, so UIs should probably warn.
\u0337a̸b vs a/bnot really confusable because of positioning\u0338a̷b vs a/bnot
really confusable because of positioning\u3033a〳b vs a/bnot really
confusable because of positioning\u05F4a״b vs a"bsimilar, but not the same,
and " isn't allowed in domain names anyway.

(also found on http://www.macchiato.com/unicode/idna/idna-info-key)


>
> CONTEXT0:
> \u05F4 HEBREW PUNCTUATION GERSHAYIM (double quotes)
>
> Disclaimer: I haven't been exercising oversight over the extension of
> this list, and am somewhat surprised to see characters in it which do
> not resemble period, colon, slash or hyphen-minus.
>
> Gerv
>
> Full Results For Mozilla Character Blocklist Under IDNA2008 Rules
> -----------------------------------------------------------------
>
>
> http://mxr.mozilla.org/mozilla-central/source/modules/libpref/src/init/all.js#762
>
> \u0020\u00A0\u00BC\u00BD\u00BE
>
> DISALLOWED
>
> \u01C3\u02D0\u0337\u0338
>
> PVALID
>
> \u0589\u05C3
>
> DISALLOWED
>
> \u05F4
>
> CONTEXT0
>
>
> \u0609\u060A\u066A\u06D4\u0701\u0702\u0703\u0704\u115F\u1160\u1735\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u200B\u2024\u2027\u2028\u2029\u202F\u2039\u203A\u2041\u2044\u2052\u205F\u2153\u2154\u2155\u2156\u2157\u2158\u2159\u215A\u215B\u215C\u215D\u215E\u215F
>
> \u2215\u2236\u23AE\u2571\u29F6\u29F8\u2AFB\u2AFD\u2FF0\u2FF1\u2FF2\u2FF3\u2FF4\u2FF5\u2FF6\u2FF7\u2FF8\u2FF9\u2FFA\u2FFB\u3000\u3002\u3014\u3015
>
> DISALLOWED
>
> \u3033
>
> PVALID
>
>
> \u3164\u321D\u321E\u33AE\u33AF\u33C6\u33DF\uA789\uFE14\uFE15\uFE3F\uFE5D\uFE5E\uFEFF\uFF0E\uFF0F\uFF61\uFFA0\uFFF9\uFFFA\uFFFB\uFFFC\uFFFD
>
> DISALLOWED
>
> Gerv
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090717/9a9188db/attachment.htm 


More information about the Idna-update mailing list