The lookalike problem(s)

Paul Hoffman phoffman at imc.org
Sat Nov 25 23:20:39 CET 2006


At 4:27 PM +0100 11/25/06, Harald Alvestrand wrote:
>--On 24. november 2006 11:30 -0800 Paul Hoffman <phoffman at imc.org> wrote:
>
>>At 3:09 PM +0100 11/24/06, Harald Alvestrand wrote:
>>>Greek lookalikes can be partially solved by a "don't change scripts
>>>in mid-label" rule.
>>>Since IPA is part of the Latin block, that's not a solution for it.
>>
>>Just checking: does that mean that you cannot have a Greek name that
>>includes digits?
>
>Since the numbers are part of script "common", not script "Latin", I 
>expect that
>*some* changing of scripts mid-label is OK. Not at all sure why 
>IDEOGRAPHIC CLOSING MARK (U+3006) or MASU MARK (U+303C) should be OK 
>to use with all scripts, but if we're not making up rules for 
>individual letters, they go into the same group....

It appears that draft-klensin-idnabis-issues-00.txt only deals with 
mixing scripts in a single label in one place. In the middle of 
section 2.1.6, there is the following:
    Registry restrictions might include prohibition of
    mixed-script labels, or restrictions on labels permitted in a zone if
    certain other labels are already present (See [RFC3743] and [RFC4290]
    for discussion of some of the methods that have been applied by some
    registries).

This seems like a very weak protection against the problem mentioned 
by Harald above. On the other hand, strengthening the restriction 
will either lead us into making tables about which specific 
characters can be mixed together ("a Latin label cannot contain 
TELUGU DIGIT ZERO because it looks like a lowercase o"), or 
coarser-grained prohibitions ("cannot mix Latin and Greek") which 
will have negative effects on non-Latin names.

This also directly relates to the issue of orthographies that need 
apostrophe-like characters.


More information about the Idna-update mailing list