baking into the protocol

Erik van der Poel erikv at google.com
Wed Dec 20 23:38:06 CET 2006


On 12/20/06, John C Klensin <klensin at jck.com> wrote:
>
> --On Wednesday, 20 December, 2006 11:29 -0800 Erik van der Poel
> <erikv at google.com> wrote:
>
> > The registries and the user agents _are_ the ones that could
> > perform some action based on mixed script rules.
>
> I disagree, strongly.  We have the reasonable expectation that
> protocol changes -- IDNA (and/or stringprep or nameprep)
> changes-- will be implemented globally.

I imagine that this would depend on what IDNA200x actually says. If,
for example, IDNA200x tried (somehow) to forbid the NFKC mappings
specified in IDNA2003 in HTML user agents, some implementors might
ignore such a rule. Note that I'm not making a prediction; I'm merely
stating a concern.

If IDNA200x focusses on the _output_ of the nameprep process, and does
not mention the processes that user agents must implement at lookup
time, then we may see implementors do it differently. This might also
be a concern. I'm talking not only about the NFKC and case mappings,
but, to be pedantic, also the conversion from whatever encoding the
text uses, to Unicode. It would be great if all of these processes
were spelled out, so that we have a common spec to adhere to.

> Relying entirely on
> registry restrictions, or user agent restrictions that
> completely forbid the use of some names that can be registered,
> is a recipe for fragmentation of the DNS namespace.

Note that I also suggested that a user agent could simply refrain from
displaying the Unicode form of a label.

IDNA2003 had a rather large inclusion list, the result of excluding an
uncontroversial, small set of characters.

IDNA200x seems to be taking the opposite approach, including only
those scripts and characters that we agree on. As Unicode grows, the
inclusion list can grow too, based on certain rules.

In a somewhat similar manner, we could start with certain script
mixture rules, and then relax or refine those rules over time. It's
just a suggestion.

> > Until we have figured
> > all of this out, should we simply prohibit certain script
> > mixtures?
>
> There is another problem with prohibiting script-mixing at the
> protocol (IDNA) level and that is that the common,
> on-the-street, perception of "the script we use" is different
> from the Unicode definitions of "script".  No one is wrong here,
> but, if JDNC concludes that Romanji is a necessity and must be
> available in mixed names with Kanji and Kana, I don't think we
> are in a position to say "no" (although we can _advise_ that
> this isn't a good idea).  Similar examples arise with mixtures
> of Cyrillic and Roman characters in Russia, even though we are
> agreed that is one of the more dangerous cases of mixed-script
> labels (the fact that some strings in Cyrillic can be confused
> with names in Latin characters even when they are purely
> Cyrillic is one of the arguments why prohibiting mixed scripts
> isn't nearly as powerful a tool as is often argued).

Yup, that is exactly what I meant when I said "certain script
mixtures". You read my mind. :-)

Of course, it would take a while to come up with detailed and
complicated "script" mixing rules. All I'm saying is that we could
simply prohibit some simple (Unicode) script mixtures for now, and
then come up with more detailed rules later. But I'm guessing that
your position is that such a set of rules would fragment the DNS
namespace permanently(?). Did I get that right?

Erik


More information about the Idna-update mailing list