Unicode versions (Re: Criteria for exceptional characters)

Kenneth Whistler kenw at sybase.com
Mon Dec 18 21:23:34 CET 2006


> Perhaps it is not necessary to remove the entire script. It may be
> sufficient to warn about U+1431, which may look like a slash followed
> by a backslash in some fonts and/or at small sizes.

Ah, but so might U+039B GREEK CAPITAL LETTER LAMDA and
U+0245 LATIN CAPITAL LETTER TURNED V (themselves mutually
confusable) and U+1948 LIMBU DIGIT TWO and U+2D37 TIFINAGH LETTER YAD.

I don't see much reason to call out UCAS, which *is* used
by modern communities and has a political backing, when for
nearly every script, including Latin, you can find some characters
which because of their simple shapes, can be confused with
other commonly used letters or even syntax elements.

> It may also be a
> good idea to warn that some of the characters in that script might be
> used to confuse novice users, who may not be very familiar with the
> precise set of characters used to delimit portions of a URL. Some of
> this computer stuff is rather confusing to beginners.

Picking out a few on an ad hoc basis doesn't strike me as
very interesting. Rather, working on the confusables tables
for UTS #39 seems to me to be a better way to bring such
issues more systematically to the attention of those,
such as registrars, who might need programmatic assistance
to help weed out potentially confusing registrations.

--Ken




More information about the Idna-update mailing list