The Two Lookups Approach (was Re: Parsing the issuesand finding a middle ground -- another attempt)

Erik van der Poel erikv at google.com
Sat Mar 7 18:13:51 CET 2009


Patrik,

I think I agree with you, but I also think the wording you chose today
can easily be misunderstood.

IDNA2003 is absolutely clear about "what goes in DNS", i.e. Punycode
with xn-- prepended. In fact, it was careful to specify that /all/
IDNA-unaware domain name slots must receive that format (xn-- with
Punycode).

The term "U-label" is indeed one of the good things about IDNA2008.
This makes it very easy for other specs (e.g. SMTP-related) to specify
the use of U-labels. If those specs had to use IDNA2003 terminology,
it would sound a bit clunkier e.g. "only strings that would be
generated by first applying ToASCII, then ToUnicode".

So I still believe that removing mappings from IDNA is a good thing,
because it allows the higher layers (such as the series of SMTP specs)
to avoid the mess that HTML got itself into. If there are no mappings
in the base IDNA2008 spec, the SMTP spec can refer directly to that
base spec, and avoid the issues surrounding Eszett et al.

This does not mean that a mapping spec is unnecessary. Far from it. In
order to maximize interoperability in situations where the original
typist's language is unknown, we must have a crystal clear global
mapping spec, i.e. language-independent.

This does not mean that a local (language-specific) mapping
recommendation is unnecessary. Far from it. In order to maximize
interoperability in the keyboard UI and registrar UI contexts, we must
provide crystal clear recommendations about the types of local
mappings that are reasonable and those that are not.

I believe the HTML IDNA mess could be cleaned up somewhat by making
the HTML implementations stricter, e.g. by requiring U-labels or even
A-labels. It would be a complete disaster if HTML implementers went in
the other direction, i.e. more lenient, by performing
language-specific mappings on domain names in hrefs. I hope many would
join me in publicly and strongly criticizing any HTML implementer that
makes such a mistake.

Please, please, no more mistakes in HTML implementations.

Erik

On Sat, Mar 7, 2009 at 6:24 AM, Patrik Fältström <patrik at frobbit.se> wrote:
> Regardless of in what direction this is going, I want personally this to
> very clearly be something that is done "before" what we have in IDNA2008 is
> used. So that what we really talk about is a standardized mapping procedure
> that applications use. I do not want to see an extension of U-label. That we
> have this definition now is one of the best things with IDNA2008, as we have
> severe confusion with IDNA2003 where there is basically no difference
> between what is mapped and what goes in DNS.
>
> Because of this, I do not have so much interest personally in exactly what
> is defined for this mapping, but I must say I am seriously confused by the
> ideas on doing two lookups. I strongly support the list of issues Marcos
> wrote about. You can _NOT_ do two lookups and think they have any context
> between them what so ever.
>
> I also want to remind people that we have had file systems with different
> casing, line break and other special rules, and the world seems to have
> survived...
>
> I think we "just" need documents that talk about how to handle different
> situations in the time of lookup (and similar recommendations for
> registries, but registries have already worked with these issues for many
> years, so I do not really see the big problem).
>
>   Patrik
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>


More information about the Idna-update mailing list