Eszett

John C Klensin klensin at jck.com
Sat Jul 11 21:11:52 CEST 2009


--On Friday, July 10, 2009 23:41 +0000 Shawn Steele
<Shawn.Steele at microsoft.com> wrote:

> I shudder at bringing this up, but it's come up a few times
> in different threads and different contexts recently.  (Yes,
> some by me, others not).  I believe that the current plan of
> changing the IDNA2003 eszett behavior is a bad idea.  Even
> worse, I believe the drafts have arrived at this position as
> an accident of the evolution of the process and not
> well-reasoned behavior.
> 
> My understanding of the situation (I'm going to
> oversummerize, and no, I'm not going to go re-read all the
> eszett archives, I only have a week before vacation ☺)
>...

Shawn,

I think Michael and Martin have addressed most of the
substantive issue here.  I want to address the procedural one.

I would like to get this work finished.  The length of time it
has dragged out has created uncertainty in the community, which
is bad.  Enough people may be tuning out due to large amounts of
traffic about settled issues that we risk make decisions through
exhaustion rather than through careful thinking.  To get
finished, we need to stop the self-inflicted denial of service
attacks that result from people revisiting the same issues over
and over again just because "it has come up".

It is always reasonable to reopen an issue because there is new
information that we didn't have when the decision was made.  But
there is no new information in your note and, as Michael and
Martin note, your summary slightly misrepresents the situation.

It seems to me that the issue is very simple and quite
accurately captured in the DENIC comments.  Those comments were
carefully nuanced and I'd encourage people to reread them rather
than citing them to "prove" whatever they want to prove.  First,
they agree that having Eszett as a separate character is
desirable -- they, and others, made that point in 2002.   For
whatever reason, the WG decided to map it to "ss" instead.  I
think that reason had more to do with the global decision to use
toCaseFold in Stringprep than any reasoning about the relevance
of the particular character, but others may remember things
differently.   Second, since they have a deployed base of names
containing the string "ss", some of which may have started out
as Eszett, their preference today _as a registry_ would be to
preserve the IDNA2003 behavior because that would save a
difficult conversion process.    Can't blame them for that.  But
every expert on German writing we've heard from has said
"separate character" and DENIC has also been clear that, if we
decide to make the change, they can cope with it.

Similarly, the decision that we were going to move from the
"permit every possible Unicode character somehow" approach to
what has been called an inclusion-based model in which there has
to be a specific, identifier- or mnemonic-based, decision to
include a group of characters was made long, long, ago.  No data
that would, IMO, justify reopening that decision --indeed, no
new data at all-- has been introduced in the last year or more.
One can always repeat the case for preserving every label and
mapping that was valid under IDNA2003 again, but most of us
understand it quite well, understand that there are tradeoffs
involved, but have chosen to go with a more minimalist
approach... and, again, there has been no evidence offered to
justify reopening that discussion beyond the desire of someone
(you are not the only one) to ask again, with the same data, to
see if they get a different answer.

Finally, I was involved in some work in user interface issues
long enough ago that the field was still called human factors in
computing.  One of my professors was fond of saying that users
are able to be far more adaptable than we usually expect, but
that part of our job is to prevent them from being required to
do more adapting than necessary.  I don't think we should be
asking users to avoid upper case, but I think they would get
used to it if that were necessary.  I also believe that having
different behaviors when users switch systems is bad but not
only do I believe that they would adapt to such changes (some of
the differences we are talking about involves switching, e.g.,
keyboards and entry methods too, and users adapt to those much
more difficult changes) but I believe that differences in
behavior intra-system are much worse.

As an example of the latter case, I'm told that, if a user puts
"fußball" into an Active Directory database but enters
"fussball" as a name (or vice versa) the two are simply not
going to match.  Indeed, I'm told that, if the sequence U+0061
U+0341 is typed by the user or pulled out of a file, but U+00E1
is in the database, they won't match either.  My personal
opinion is that the latter is bizarre, even though I'm sure
someone had a good reason.  That opinion is apparently shared by
others and is why IDNA2008 has required NFC-compliant strings --
to eliminate just those differences in how a character is
assembled-- since the beginning.  But your users have apparently
gotten used to the situation in which U+0061 U+0341 and U+00E1,
and fußball" and "fussball", respectively, don't match under
Active Directory but do match under IDNA2008.  I would think
that, in terms of user astonishment (or the lack thereof), there
is at least as strong a case to be made for reducing the number
of IDNA mappings to bring the user experience closer to that of
identifiers used in other contexts in your systems as there is a
case for preserving IDNA2003 behavior just because it is
IDNA2003 behavior and that some users have gotten used to it. 

But, again, the main point is that we are not making progress by
revisiting the same situations over and over again to see if we
get different answers.  Put differently, if you want to reopen
an old issue, it seems to me that it should be incumbent on you
to review the relevant archives to be sure that you have
something new to say and that the questions you are raising
haven't been answered multiple times already --vacation or no
vacation-- rather than forcing the rest of us to review the
arguments for you.

 regards,
     john

p.s. While I don't agree with Mark and others that dropping all
notions of mapping and the mapping document is better than
specifying mapping that is even sometimes optional, _that_ is a
new question and at least partially a discussion.  But reopening
Eszett is not.



More information about the Idna-update mailing list