baking into the protocol

Mark Davis mark.davis at icu-project.org
Wed Dec 20 21:21:47 CET 2006


Yes, by "bake into the protocol", I mean incorporate into IDNA200x. And what
I'm saying is that I'm a bit leery of putting a mixed-script prohibition
into IDNA200x. Not dead set against it, but leery.

Mark

On 12/20/06, Erik van der Poel <erikv at google.com> wrote:
>
> Re: "baked into the protocol": _The_ protocol? There are several
> protocols involved here, and I will only list some of them:
>
> (1) DNS. This protocol actually doesn't care what byte values you put
> into the labels. There is a length byte that indicates how many bytes
> are in each label, but 2 of the bits in the length bytes are used for
> repetitive substrings, leaving only 6 bits for the length, so that's
> where the 63-byte limit per label comes from. However, higher-level
> protocols, such as email-related ones, _do_ care about the byte values
> in the labels, and you will bump into all sorts of interoperability
> problems if you try to use byte values outside the LDH set. This is
> why we have Punycode, which re-encodes Unicode in the LDH set.
>
> (2) Communication between registrar and registry. Some
> registry/registrar pairs use more-or-less standardized protocols
> called Registry Registrar Protocol and others. This is one area where
> it might be possible to apply mixed script rules. I.e. the registry
> would simply say "No" when the registrar attempts to register such a
> label.
>
> (3) HTML. This is also a protocol in the sense that it crosses the
> wire or ether. This is another area where user agents could apply
> mixed script rules. One extreme is to simply refuse to perform a DNS
> lookup when any of the labels mixes scripts. A less extreme policy
> would be to refrain from displaying the Unicode version of a label
> when that label mixes scripts.
>
> So, as you can see, there really is no distinction between "bak[ing]
> into the protocol" and having "registr[ies] and/or user-agents" apply
> rules. There is no magical interceptor in the DNS infrastructure that
> could block certain operations based on mixed script rules. The
> registries and the user agents _are_ the ones that could perform some
> action based on mixed script rules.
>
> Now, if by "protocol" you are referring to the rules in a future
> IDNA200x or the guidelines in a future ICANN document, then I agree
> that many people would balk at the idea of prohibiting mixed scripts
> in those documents. But then maybe this is just what we need,
> initially, until we have a better understanding of the problem or some
> progress on this front. In particular, Michael appears to believe that
> it might be possible to get the committees to encode additional
> characters so that no community would be forced to cross Unicode
> script boundaries to write their words. Doesn't Kurdish require Latin
> w and q to be mixed into their Cyrillic text? Until we have figured
> all of this out, should we simply prohibit certain script mixtures?
>
> Erik
>
> On 12/20/06, Mark Davis <mark.davis at icu-project.org> wrote:
> > I tend to agree with Michael on the usefulness of disallowing mixed
> scripts.
> > [...]
> > I am not yet, however, so sure that it should be baked into the
> protocol.
> > This is a pretty big hammer, and it may be better to leave it to the
> > registrars and/or the user-agents, which have a lot more flexibility.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20061220/6e2d5ec8/attachment.html


More information about the Idna-update mailing list