Mixing scripts (Re: Unicode versions (Re: Criteria for exceptional characters))

Fri Dec 22 18:16:05 CET 2006

I agree with John's statement and position, with a couple of slight changes.

> it essentially impossible to correctly represent large
> fractions of some languages without the use of zero-width
> joining and non-joining characters.

It is indeed unfortunate that we are in the circumstance where the joining
characters are necessary for some languages. In retrospect, not the optimal
choice (although in many cases the alternatives were also quite unpalatable,
for different reasons). There is a public review issue making a proposal for
handling these characters in identifiers, at
http://www.unicode.org/review/#pri96, and we'd appreciate feedback on it.

> Such rules would effectively violate a ban on mixed-script
> prohibitions, so we had better not get too tied up about general
> principles.

What we say in PRI#96 is:
> In each of the following contexts, the match to the regular expressions
must also only consist of characters from a single script (after ignoring
Common and Inherited Script characters).

While it does place limitations on fields containing joiner characters on
the basis of script, it doesn't require the mixture of scripts, in the sense
used in http://www.unicode.org/reports/tr39/#Mixed_Script_Detection.

Mark

On 12/22/06, John C Klensin <klensin at jck.com> wrote:
>
>
>
> --On Friday, 22 December, 2006 16:42 +0100 "Marcos Sanz/Denic"
> <sanz at denic.de> wrote:
>
> >...
> > Spoofing is an inherently non-technical issue, which existed
> > before there  were domain names. You can try to address it
> > with technical tools, but it  is far from being solved by
> > banning mixed scripts: there will still be  single-script
> > confusables and whole-script confusables and people who will
> > confuse "deutsche-bank.de" with "your-deutsche-bank.de" (what
> > I call  conceptually confusables) and people who will not be
> > wearing there glasses  when the click on a link.
> >
> > So instead of trying to address the spoofing issue in an
> > incomplette  manner at the wrong level (and generating new
> > kind of unexpected problems  with that), leave the decision of
> > what is a spoof of something else to the  information experts.
> > For instance, the WIPO has decissions based in the  UDRP,
> > Paragraph 4ai, which specifically already protects owners of
> > trademarks when "the domain name is identical or confusingly
> > similar to a  trademark or service mark in which the
> > complainant has rights".
> >...
>
> Marcus,
>
> I don't want to repeat things that have already been said, but I
> do want to suggest that the fairly absolute line you seem to be
> taking  on this may not be helpful.
>
> First, if one relies on the WIPO position alone, one creates,
> not only a full-employment act for lawyers and UDRP mediators,
> but also two more specific problems.
>
>         (1) Those remedies can be applied only after the fact.
>         What we have seen on the Internet is that many of the
>         "bad guys" operate on a "register a domain name, defraud
>         people using that name, and then move on before any
>         action can be taken against that particular name" basis.
>         In many cases, they stop using the spoofed (or other
>         problematic) names even before registration grace
>         periods expire, not nearly long enough for someone to
>         organize a UDRP complaint and get a domain name revoked.
>
>
>         (2) WIPO is not the only UDRP provider.  Even among WIPO
>         providers, more so if other providers are considered,
>         and more so yet when mediators operate under national
>         rules rather than under ICANN/WIPO ones, some decisions
>         are wildly inconsistent with others.  It is just not
>         possible to rely on that process to reach consistently
>         plausible results... a situation that, IMO, will become
>         worse as we see more cross-script (not just
>         mixed-script) spoofing.
>
> I think that, as a community, we should be looking for ways to
> minimize the risk of these problems.  I would not go so far as
> to say that the only reason for not doing so is greed, but I
> believe that registries that do not follow reasonable standards
> of user protection and good sense will, in many countries,
> ultimately be held liable for that behavior.   As in almost all
> other technology-related areas, industry self-regulation, when
> it can be made effective, is preferable to heavy-handed and
> reactive governmental or legal system interventions... if only
> because it is more plausible to expect that industry will at
> least understand the issues.
>
> Second, while I share your distaste for general mixed-script
> prohibitions in the IDNA protocols (as I hope I've made clear
> already), I believe that this is another area in which looking
> closely at sometimes-complex engineering tradeoffs may be much
> more useful than making sweeping statements that sound political
> or religious even if they are not.  For example, the Unicode
> model for handling what they describe as "presentation forms"
> makes it essentially impossible to correctly represent large
> fractions of some languages without the use of zero-width
> joining and non-joining characters.  Those characters are
> prohibited in IDNA2003 because, if embedded in arbitrary
> strings, they are invisible and invisible characters are the
> would-be spoofer's dream come true.  I believe that we will have
> to come up with some way to permit them and that doing so will
> require some very sensitively-constructed rules about
> relationships to particular scripts and/or about adjacent
> characters and that those rules will need to be in the protocol.
> Such rules would effectively violate a ban on mixed-script
> prohibitions, so we had better not get too tied up about general
> principles.
>
> regards,
>     john
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061222/6e92e254/attachment.html