Mixing scripts (Re: Unicode versions (Re: Criteria for exceptional characters))

Fri Dec 22 18:21:09 CET 2006

--On Friday, 22 December, 2006 09:16 -0800 Mark Davis
<mark.davis at icu-project.org> wrote:

> What we say in PRI#96 is:
>> In each of the following contexts, the match to the regular
>> expressions
> must also only consist of characters from a single script
> (after ignoring
> Common and Inherited Script characters).
> 
> While it does place limitations on fields containing joiner
> characters on
> the basis of script, it doesn't require the mixture of
> scripts, in the sense
> used in
> http://www.unicode.org/reports/tr39/#Mixed_Script_Detection.

I certainly understood that and did not intend to imply
otherwise.   What I was trying to say is that one of the
arguments against protocols rules prohibiting mixing of scripts
is that, with the exception of some bidi issues (which we got at
least partially wrong), IDNA2003 operates on characters, not
complete labels.  A mixed-script test requires making the step
into evaluating complete labels for correctness (under Michael's
proposal, complete FQDNs).  That is a non-trivial step.  I
believe that any sensible model for handling ZWJ and ZWNJ
(including that of PRI#96, which I assume to be the default
unless better ways are found) will require looking at full
labels or at least sequences of characters, i.e., making that
step.

    john