Lower casing

Shawn Steele Shawn.Steele at microsoft.com
Sat Jan 29 19:13:24 CET 2011

Well, that's why there's UTR#46 :)  (Because IDNA2008 didn't allow for compatibility).

Mark Davis, myself, and others tried to point out that importance of compatibility and mappings.  Note that, for web sites, the eszett, etc. is needed for display, not matching.  People would like their display to be correct, however matching cannot change. 

Here's the problem (actually only one) with just turning on a switch and dropping the old behavior:

* I get my bright new shiny browser and go to myßbank.com.  Cool!  It works in IDNA2008.
* Of course there are maybe a billion computers on the planet?  
* Now I have to take a business trip and need to check my balance.   So I go to the local library and visit myßbank.com.  It takes me to myssbank.com, which just happens to be spoofing myßbank.com.

We also asked that bundling be required to avoid this, but it isn't.  I'm not going to make a change that has this severe of a security problem.  It's one thing to be spoofed by a typo.  It's completely different to be spoofed by the right name.

We are agreed that eszett, etc. are necessary for display, but I'm not going to intentionally cause a state where different domain names go to different web sites on different machines.

We still have machines running Windows 2000, Windows XP.  (And probably older.)  If I could guarantee that everyone upgraded their system instantly, or even a month, maybe this could work.  Unfortunately the evidence is that there's a very long tail of people that don't take the necessary updates.

I am definitely NOT saying that "I know German better than Germans."  Eszett is important for proper display of words, in German, especially in Germany.  However I see that I cannot register aaa.com for Aardvarks Advocates of Antartica because it's already taken, and even though the current aaa.com owners have nothing to do with Aardvarks (AFAIK).  So I have no problem with saying "sorry, fußball.com" is already disallowed because someone has "fussball.com", or the opposite.  

Sure, there are a few cases where there is a semantic difference between two words in Germany that would be spelled the same in Switzerland, but I thought there we were pretty clear on numerous other threads that DNS labels are intended as conveniences, and not a way to enable every semantic concept in every language.  For example, I seriously doubt that the case matching of i and I will ever be change in DNS to allow for the Turkish expectations.  Even in English there're concepts that are mutual exclusive but we'd see as different if CamelCased.
I believe my company is pretty unanimous on this; I've talked with other experts in IE, DNS infrastructure, and other teams.  That includes one of your co-authors of draft-iab-idn-encoding.  We cannot implement IDNA2008 without UTS#46, it is a potentially serious security problem.  We made that fairly clear before the RFCs were published.  It's also clear that we aren't the only company with this belief.  I won't speak for them, but would note that Mark wrote UTS#46 and made the comment that started this branch of the thread.

I think you miscounted:  With native UTF-8 not being normalized or mapped or anything, IDNA2008 by itself would cause three lookup systems, not two:  UTF-8, IDNA2003 & IDNA2008.  I believe that reconciling UTF-8 lookup with the IDNA2003/UTS#46 mappings is a huge problem that needs to be solved, likely outside the scope of this thread.  


PS: FWIW: My PERSONAL preference would be that there be an additional record that explicitly states the desired display form.  Then DNS can be used for matching/lookup (as it should be, and has been), and domain owners could still state their intent, with CamelCasing or other differences that are important to them, which may not even be semantic, but could be.

 

From: John C Klensin [klensin at jck.com]
Sent: Saturday, January 29, 2011 8:39 AM
To: Shawn Steele; Mark Davis ☕
Cc: Simon Josefsson; idna-update at alvestrand.no
Subject: RE: Lower casing


The problem here is that there is no "transition" for those four
characters.  If browsers and other client systems provide the
IDNA2003/ TR46 mapping there are only:

        -- IDNA2003 behavior forever
        -- Rolling flag day now
        -- Rolling flag day at some indefinite point in the

By "rolling flag day" I mean that a client computer has one
behavior or the other on a given day but that not all client
systems will convert on the same day (or even in the same month
or year).

IMO, the reason why the WG was willing to make the change was
because of significant input that the ability to distinguish
between the characters that are, under UTS#46, source and
targets of mappings was important on both input and output
(remember that there is a display issue here too because an
IDNA2008 A-label that encodes the four characters is essentially
invalid under IDNA2003).

For those groups for whom the distinction among one or more of
those character pairs (including "ignore" as the pair for the
Joiner set) actually is important, "register both" is not
meaningful: "we are applying the UTS#46 rules, including those
for 'deviation' characters" is equivalent to "you lose; we know
what your language needs better than you do".  It is telling
that all of the registries who are focused on those strings and
from whom we've received reports (other than the
somewhat-conflicting reports about Greek) have basically said
"ok, let's do it and get it over with".

There is another element of this depending on when the mapping
is applied: the "native UTF-8 in lookups outside the public DNS"
situation that is addressed in draft-iab-idn-encoding is, in
general, UTF-8 without even any normalization, much less
encoding.  By applying UTS#46 mappings, you compound the problem
of having to support two lookup encodings by having strings that
are fully-valid and accessible under IDNA2008 _and_ the internal
databases/ directories but that are not accessible from your
browser (at all for the public DNS and maybe not from the
private databases if you guess wrong about when to apply the
mapping.   That is also another way to look at the "incompetible
change" problem, which is that this is either about maintaining
compatibility with the public DNS names that were registered or
used assuming the IDNA2003 rules and restoring compatibility
with the strings that are valid and sensible in those internal
databases that you support and encourage.

As long as you understand all of those tradeoffs, you should
make whatever decisions make sense to you.   I'm glad I don't
have to make the decision.


--On Saturday, January 29, 2011 1:35 AM +0000 Shawn Steele
<Shawn.Steele at microsoft.com> wrote:

> (& I've been describing that behavior, including UTS#46
> transitional behavior and mappings, as IDNA2008 + UTS#46 to
> make it clear).
> -Shawn
> From: Shawn Steele
> Sent: Friday, January 28, 2011 5:34 PM
> To: 'Mark Davis ☕'; John C Klensin
> Cc: Simon Josefsson; idna-update at alvestrand.no
> Subject: RE: Lower casing
> It is worth mentioning that our code will follow the
> transitional guidelines, as we will otherwise break existing
> IDNA2003 users.  Presumably people who want both versions to
> work will register both versions.

More information about the Idna-update mailing list