The lookalike problem(s)
phoffman at imc.org
Sat Nov 25 23:20:39 CET 2006
At 4:27 PM +0100 11/25/06, Harald Alvestrand wrote:
>--On 24. november 2006 11:30 -0800 Paul Hoffman <phoffman at imc.org> wrote:
>>At 3:09 PM +0100 11/24/06, Harald Alvestrand wrote:
>>>Greek lookalikes can be partially solved by a "don't change scripts
>>>in mid-label" rule.
>>>Since IPA is part of the Latin block, that's not a solution for it.
>>Just checking: does that mean that you cannot have a Greek name that
>Since the numbers are part of script "common", not script "Latin", I
>*some* changing of scripts mid-label is OK. Not at all sure why
>IDEOGRAPHIC CLOSING MARK (U+3006) or MASU MARK (U+303C) should be OK
>to use with all scripts, but if we're not making up rules for
>individual letters, they go into the same group....
It appears that draft-klensin-idnabis-issues-00.txt only deals with
mixing scripts in a single label in one place. In the middle of
section 2.1.6, there is the following:
Registry restrictions might include prohibition of
mixed-script labels, or restrictions on labels permitted in a zone if
certain other labels are already present (See [RFC3743] and [RFC4290]
for discussion of some of the methods that have been applied by some
This seems like a very weak protection against the problem mentioned
by Harald above. On the other hand, strengthening the restriction
will either lead us into making tables about which specific
characters can be mixed together ("a Latin label cannot contain
TELUGU DIGIT ZERO because it looks like a lowercase o"), or
coarser-grained prohibitions ("cannot mix Latin and Greek") which
will have negative effects on non-Latin names.
This also directly relates to the issue of orthographies that need
More information about the Idna-update