Mapping and Variants

Tue Mar 10 19:55:25 CET 2009

comments below.

Mark

On Tue, Mar 10, 2009 at 01:06, Michael Everson <everson at evertype.com> wrote:

> Oh my gods.
>
> Are we back HERE, at THIS decision?
>
> On 10 Mar 2009, at 05:18, Michel SUIGNARD wrote:
>
> > +1 on Mark's message concerning confusability.
> > I also think that script mixing within a label should be a client
> > application decision, not dictated by protocol.
>
> This is madness. I said this first when Cary started talking to me
> about this, when he was editing a draft when WG2 was at Sophia
> Antipolis.
>
> At that time, the idea that Cyrillic and Greek and Latin and Cherokee
> could be permitted to intermix within a script label horrifies me --
> unless the idea is to say "feck it, we don't care about being
> responsible for enforcing any security whatsoever".

I don't think anyone particularly wants those ones to be mixed. The question
is whether the protocol forbids script mixing, or there are other means to
prevent it. There are clearly legitimate uses, such as romaji in Japanese.

The current draft IDNA, does not forbid script mixing in the protocol. *ICANN
rules forbid it, which is a very different matter* (with some exceptions,
which as we've said, are not clearly defined).

If you are concerned about IDNA issues, then you should stay current with
the drafts:

30 Nov 2008  draft-ietf-idnabis-bidi<http://tools.ietf.org/html/draft-ietf-idnabis-bidi>
09 Mar 2009  draft-ietf-idnabis-defs<http://tools.ietf.org/html/draft-ietf-idnabis-defs>
09 Mar 2009  draft-ietf-idnabis-protocol<http://tools.ietf.org/html/draft-ietf-idnabis-protocol>
09 Mar 2009  draft-ietf-idnabis-rationale<http://tools.ietf.org/html/draft-ietf-idnabis-rationale>
22 Dec 2008  draft-ietf-idnabis-tables<http://tools.ietf.org/html/draft-ietf-idnabis-tables>

>
> Was a decision to ban script-mixing within a label made?
>
Or was it not
> made? If it was not made, I am surprised, as I thought it had been. If
> it was made, why the hell is it being proposed to unmake it?

It is not banned, and has not been banned (I think) in any of the many
drafts (John can say precisely).

>
>
> > For many scripts it is in fact innocuous and desirable to be mixed
> > with ASCII Latin (take Japanese and Romaji for example). In my days
> > at Microsoft, when helping exposing IDN in IE7, we went from a
> > fairly restrictive model to a much more open model concerning script
> > mixing, clearly banning the problematic cases (such as Greek,
> > Cyrillic, Latin mixing), but allowing for example most of the Asian
> > scripts to be mixed with Latin, and
> > obviously allowing the mixed script scenarios required for Japanese
> > and Korean.
>
> BUT WASN'T THIS ALREADY DECIDED?

See above.

>
> > Finally the script property as exposed by Unicode cannot be used
> > without
> > some careful analysis to determine 'single' script. There are values
> > such as 'Common' and 'Inherited' which have to be allowed with most
> > other script values.
>
> Give examples when you make a statement like this please. Otherwise it
> is scare tactics.

This is not scare tactics, and I'm not the one SHOUTING. I have given
examples before on the list, which it appears that you haven't noticed. The
simplest are the digits 0-9. If you forbid mixing Common with Cyrillic, for
example, then you can't have Cyrillic labels with digits.

While eliminating digits might be appropriate for TLDs, it is certainly not
intended for other labels.

For an approximate list, see:
http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:L:][:Mn:][:Mc:][:Nd:]-[:nfkcqc=n:]-[:defaultignorablecodepoint:]-[
^[:script=common:][:script=inherited:]]

You also couldn't use Arabic vowel marks, or accented Latin letters that
aren't precomposed (needed for some African languages).

>
> > At the same time, 'Common' is a value that often means 'shared' by
> > at least two scripts, and it does not mean that all 'Common'
> > characters should be mixable with all scripts.
>
> Ditto.
>
> > In other words, it is way too complicated to be enshrined in a
> > protocol
> > where stability is a feature.
>
> You have to make arguments by reference to examples that specify your
> concern. Even I, Unicadette that I am, don't find your argument
> convincing.

I firmly agree with you as to the need for examples, but those have been
supplied before, many many times, just not in that message.

>
>
> > It is better done by registry policies and client application
> > awareness. And it needs to be adjusted as new threats emerge while
> > respecting real need for multi-script labels when no harm potential
> > exists.
>
> Even mixing Burmese and Latin is dangerous because of Latin o and
> Burmese wa (looks like o).

No question. There are a huge number of characters that look like o; many of
the Indic digits, for example. See above.

>
>
> You know, last night I sent an IM to Cary:
>
> "I don't know why I remain on the IDNA list. Any time I say anything
> it gets ignored."
>
> Cary responded that he felt that both statements were true for
> everyone on the list.

The volume does make it difficult to follow - I know I've spent vastly more
time on this than anticipated: just reading each of the new specs carefully
takes a lot of time - and the main authors and chair are clearly overloaded.

>
> And these decisions will help run the internet....
>
> Dejectedly,
> Michael
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090310/0b306b5b/attachment.htm