Implementation questions (digressing from...)

Erik van der Poel erikv at google.com
Wed Dec 24 16:24:52 CET 2008


On Tue, Dec 23, 2008 at 6:19 PM, Shawn Steele
<Shawn.Steele at microsoft.com> wrote:
> I think that if the A-Label is in the HTML, then the browser would need
> to be able to round-trip it through Unicode.

That does seem like a nice-to-have. But is it a must-have?

> Additionally, it seems that
> if an HTML author could make it look like ß, then it would be likely
> that a user would enter a ß, so I don't think that'd help much.

I agree that the UI problem is still there. This is where I
distinguish between local and global pre-processing. If a user is
typing a domain name character by character, the implementation can
ask the user whether they intend to go to the eszett site or the ss
site. If a user is providing the domain name in one fell swoop (e.g.
clicking on a link, copying and pasting into the address bar), then
there is no way to know the intention of the author of that domain
name, so the implementation should apply the global pre-processing
rules (and under my proposal, eszett would be mapped to ss).

Some might consider this solution to be worse than the problem it is
solving (two DNS lookups for a single domain name in HTML hrefs).

But let me step back for a moment and outline some alternatives,
starting with "strict" and gradually getting more "lenient":

(1) HTML hrefs must use A-labels and LDH-labels. One advantage is that
this works in MSIE6, which may be hard to upgrade. One disadvantage is
that this does not work with MSIE7, unless it is upgraded, which might
be easier.

(2) In HTML hrefs, eszett is always mapped to ss and the only way to
include eszett is via IDNA2008 A-labels. One advantage is that we
don't need to do two DNS lookups. One disadvantage is that eszett
cannot be typed directly into HTML hrefs.

(3) In HTML hrefs, eszett can be typed directly, and the browser tries
both IDNA2003 (ss) and IDNA2008 (A-label). One advantage is that
eszett can be typed directly. One disadvantage is that two DNS lookups
are required.

For completeness, I should add the following, because the current
IDNA2008 Protocol draft says "It is important to note that the intent
of these specifications is that labels in application protocols,
files, or links are intended to be in U-label or A-label form." Note
that this one is not more lenient than the previous one, breaking the
above pattern of "strict" to "lenient".

(4) In HTML hrefs, only U-labels and A-labels are accepted, and
U-labels containing eszett are converted to the corresponding A-labels
before DNS lookup. One advantage of this approach is that it is
conceptually clean and simple. One disadvantage is that it disregards
migration issues for IDNA2003 pre-processors.

Erik


More information about the Idna-update mailing list