Touchstones for "Mapping"

"Martin J. Dürst" duerst at
Fri Apr 3 04:06:29 CEST 2009

Hello John,

I hope I gave my main point already in my answer to Vint.
Just a few details below.

On 2009/04/03 6:03, John C Klensin wrote:

> Martin,
> Let me try a different take on this from some of the other
> comments that followed your note...

> I am, however, apparently in a small minority of those who
> compose a lot of XML or HTML pages: I do my editing with an
> emacs clone, enhanced by XML/HTML extensions that keep track of
> matching elements and facilitate pretty-printing, but very
> little else.  I'm frankly too cheap to spring for a
> well-supported, easy-to-use, and well-documented HTML or XML
> editor and too lazy to learn and adapt to one of the others (I
> maintain zone files with emacs too).  But the norm is that
> humans use special applications to cope with these files,
> applications that have buttons for inserting URIs and greater or
> lesser degrees of validation and completion support for those
> functions.

Some of these tools do, some don't. There is a huge variety ranging
from stuff more primitive than Windows Notepad to the ultimate
sophisticated authoring system. If you work in Emacs, you are in
a particular corner of the whole spectrum; Emacs is virtually
unlimited in the support it can provide, but in general favors
using raw text at the base.

> The existence and wide use of those tools turns the
> issue you are addressing back into a presentation problem --
> there is no plausible reason why the tools should not be capable
> of putting strings with A-labels into the relevant files while
> letting the author type in native-character Unicode strings and
> to see those strings.  There is also no plausible reason why
> those tools cannot validate links (whichever form they are in)
> and, if I recall, some of them do exactly that.

There is no plausible reason except that some of these tools,
or their users, prefer simplicity, ease of implementation, or
whatever, over sophistication. If it were that easy to have
all these tools, and to have everybody use them, the average
HTML page on the Web wouldn't be that much of a mess. Are
you assuming that on average, tools will do a better job
for IDNs than for HTML?

> Having those tools turns the issue you are raising back into a
> presentation one -- just as other flavors of WISIWG editors
> permit the user to enter text without having to enter or see the
> markup that controls formatting, there should be no reason why I
> can't type (and see) native-character strings even while
> A-labels go into the data files.

Of course not, technically. But then, what about all the other
tools? What if I want to grep some HTML pages to find some
addresses (this is similar to Mark's database example, except
that grep, at least the version I know of, doesn't support
adding functions).

> As an author, I want the
> entries in my files and on my web pages to be as exact as
> possible.

Yes. If you didn't, you would probably not work on standards
the way you do. Many people care much less about exactness,
and much more about convenience and speed.

> Especially considering the few differences in
> interpretation between IDNA2008 and IDNA2003 and the potential
> for application implementation differences in mapping support,
> that implies that I want any mapping to occur where I have
> control over it -- in the typing and presentation interfaces
> between me and the file-- and to have A-labels in the file where
> I do not have control over how something else might interpret
> the strings.

I wouldn't expect something different from you, (one of?) the
main proponent(s) of removing mapping from the core spec.
I expect the average author to definitely be interested in
having things work, but also in being able to see and read
what s/he typed.

And please note that nothing forbids a tool that checks IDNs
(as far as an IDN appears e.g. in an attribute where it's clear
that it's an IDN, and not in free text, where it's difficult
to judge what it is) but then puts the U-label in the source
rather than the A-label, but no M-labels (e.g. if you typed
an IDN with upper-case letters, it would convert that to
lower-case letters before your eyes).

> Michel's note is, I think, consistent with this point of view.
> I suggest that is why the URI spec effectively requires A-labels
> for IDNs and why it should not be changed.

The URI spec allows A-labels or percent-encoded IDNs,
with some caveats about the later for older infrastructure.
It's impossible to use real IDNs in URIs, because URIs are
ASCII only. For the purpose of the discussion here, there
is little difference between A-labels and percent-encoded
IDNs, they are both gibberish to human users, and not findable
from the real text with grep or a database.

> The question then becomes one of how far we should take the
> standard toward required support for Unicode strings in various
> places, not to accommodate authors who are using sophisticated
> editing tools, but to accommodate dinosaurs like myself.  My
> answer would be "not very far".  I'm willing to type Unicode
> strings into files and then pass those files through a converter
> (and validator) before passing it to someone else and to
> consider the annoyance of doing that to be the price I pay for
> refusal to use better tools.  And, if I get tired of that, I
> know how to extend my editor so that it supports an IRI ->  URI
> conversion function that does whatever mapping I consider
> necessary (with or without WG specification of that mapping).

Why do you think you need to do that conversion (inside your
editor or outside) when browsers and other tools are supposed
to do that?

> You wrote later...
>> There are two sides here, the protocol correctness and
>> the content correctness. By content correctness, I mean
>> whether the link e.g. goes to the intended page.
>> Completely impossible to check with punycode, of course.
> I disagree.  Validation of whether the link goes to the intended
> page is a separate function I have to invoke, implicitly or
> explicitly, to go check the link.

Sorry, I was slightly imprecise. Of course validating the actual
link in a different function. What I meant is visual checking
by the author, catching errors such as
<a href='downloads.html'>Site summary</a>, where the
link text and the link itself obviously don't match.

> Such a validator should work
> at least as well with A-labels ("punycode") as it does with IRIs
> or other native-character strings.

Nothing against that.

> The thing that A-labels
> interfere with is making a superficial visual check of whether
> the URI (or IRI) is plausible.  That can be very important, but
> is back to being a presentation issue.

That's what I meant. Of course you can call it a presentation issue.
But that doesn't change the fact that in most text editors, I can
only do that check if the IDN is actually readable and not punycoded.

Regards,   Martin.

#-# Martin J.Dürst, Professor, Aoyama Gakuin University
#-#   mailto:duerst at

More information about the Idna-update mailing list