mappings-01 and the general procedure

John C Klensin klensin at jck.com
Mon Jul 13 13:16:15 CEST 2009



--On Monday, July 13, 2009 11:41 +0200 Marie-France Berny
<mfberny at gmail.com> wrote:

> Just remember that if mapping to lower-cases is a MUST you
> force us to split. And that any other MUST MAY lead to the
> same result. Your decision.

Actually, no.  And one last time...

The decision that upper case and lower case should compare equal
for  undecorated Latin characters was made in the middle of
1971.  I say "undecorated Latin characters" rather than, e.g.,
"ASCII" because, at the time and despite RFC 20, it wasn't clear
that ASCII would succeed.  I've explained a bit about the basis
for that decision and bear some small blame for it, but very few
of the other participants in the WG do... and it is not a
decision to be made today.   FWIW, my recollection is that there
was some participation in the work that contributed to the
decision from folks at CII Bull, so the decision was not
entirely an "American" one.

For the benefit of those who now believe the distinction is
important and has always been, that syntax and matching rule
applied to both host and network names.

When the DNS was designed, there were only two choices.  One was
to preserve the earlier behavior.  The other would have been to
create an incompatibility situation far worse than anything we
have today, with no realistic transition plan.  I don't believe
that second option was even discussed.

For IDNs, there are the following possibilities (in theory at
least) for Latin-based characters:

* "A" and "a" could match for strings that contained zero
non-ASCII characters, but be different if even a single
non-ASCII appeared in the string.   That would cause the worst
sort of user astonishment, at least for users of most of the
languages that use Latin-based scripts.

* "A" and "a" could match always, but "Á" and "á" would not
match and neither would "Å" and "å".  That would be a very
strange result globally, even though it might be desirable for
French words (but not necessary French mnenomics).

* If they are all permitted, for some sense of "permitted", "A"
would match "a", "Á" would match "á", "Å" would match "å",
and so on.  

The third option was the decision made for IDNA2003, largely
because the two other options seemed intolerable.  That decision
affects orders of magnitude more labels than the Eszett and
Final Sigma cases that keep coming up over and over again on
this list.  It is too late to change it, even if we all
concluded that it was wrong (and that conclusion has not been
reached).  The IDNA2008 "no mapping" strategy doesn't help you
because, while it would eliminate the automatic and mandatory
mapping, it would do so at the cost of making "Á", "Å", etc.,
DISALLOWED -- you still would not be able to convert them into
ACE form and preserve them on reverse mapping.

I don't think anyone has dismissed the legitimacy of your
underlying concerns in your own context, although several of us
have tried to point out that these decisions about the DNS
affect the ability to distinguish among certain types of
identifiers and mnemonics and not the ability to write (and
consider and compare) running text --sentences or paragraphs in
French or any other language.  If you need a solution for the
problems you and others continue to identify, the answer appears
to me to be the same one that has prevailed for years:

	(i) Invent a "keyword" or similar system so that your
	users do not have to deal directly with DNS-based
	identifiers.  Tailor those identifiers and their
	interpretation to French as you understand it.
	
	(ii) Have those keywords map into fully-qualified DNS
	names so that all network name resolution functions work
	properly.   Those names can use randomly-generated
	labels: it is not necessary that they be related to any
	language and my personal recommendation is that you not
	use IDNs at all.   Indeed, the best solution might be to
	use numeric labels or labels that contained an
	alphabetic character or two as a code followed by
	numeric strings.

Finally, as with other issues, I believe it is not productive to
keep trying to reopen this topic.  The answer has been the same
for a very long time -- 35 or more years if one considers the
basic choices-- and no new evidence is being offered as to why
it should be reexamined.

Even if you don't accept the above and are therefore faced with
a decision to "split" or not split, you (and everyone else) are
better off if  the WG finishes its work so you can start making
decisions... dragging things out further helps no one.

regards,
    john





More information about the Idna-update mailing list