Mapping and Variants

John C Klensin klensin at jck.com
Tue Mar 10 22:30:31 CET 2009



--On Tuesday, March 10, 2009 13:15 -0700 Mark Davis
<mark at macchiato.com> wrote:

> Actually, in this particular case I think we've learned quite
> a bit.

Sure.  And I carefully did not use "waste of time" in my
message, even though you seem to have inferred it.  What I don't
know is whether the additional knowledge (and the discussions it
has caused) have moved us forward.

> Reading Marcos's message, DENIC's position is quite a
> bit more nuanced than many of use have realized for the past
> year.

> In particular:
> 
> "c) IDNA2003 is now well established and widespread. *With a
> new version of* * IDNA we would like and would expect the
> situation to be backwards* * compatible with IDNA2003. *That
> is, for all practical effects: eszett *works* for the users
> and is mapped to ss." [My bolding.]

For better or worse, I understood most of that, including in
particular the paragraph you didn't quote:

# 'a) In the pre 2003 era, we wanted eszett to be a separate IDN
# character, available for registration. Eszett and "ss" are
# just two different things.'

I don't know any more if that understanding on my part was based
on memory of pre-IDNA2003 discussions, more recent discussions
with Marcos or his colleagues, comments made in passing on the
list, or all of the above.

I interpret Marcos's posting (if this interpretation is not
correct, I hope he will swiftly correct me) as saying that they
would have preferred a separate Eszett (without the mapping) in
2003, but managed to get used to the mapping.  If we go with a
"no mapping" strategy now, they understand how to deal with that
and will get used to it (and prefer to have Eszett as a separate
character than to having it banned).   And, if given a choice
between continuing with the mapping to preserve compatibility
and having the separate character, they would, at this stage,
choose to stay with the mapping.

That is precisely one of the sorts of tradeoff situations I was
trying to identify.  There are people, including the registry a
half-dozen years ago, who believe that "ss" and Eszett are "just
two different things".  Some of them are passionate about it and
want to be certain that strings containing Eszett -- Eszett that
is recoverable from A-labels -- can be registered in the future.
At least a subset of that group would probably like to organize
a sunrise arrangement but no bundling so that, in the long term,
it is possible to register both strings containing "ss" and
strings containing Eszett in the same position and to have them
be treated as distinct.   Others are willing to say "well, the
2003 decision may not have been optimal but we would rather
continue to live with it than deal with the transition".

None of those positions are either "correct" or "incorrect".  I
don't believe that looping around on them bring the WG much
closer to a decision, either.  Let me try to make a list of
possible positions, in no particular order...

	(i) If one believes that getting rid of mapping, or at
	least getting it far away from IDNA, is important, then
	the answer is clear: Eszett should be a character, not
	banned.  
	
	(ii) If one believes that backward-compatibility with
	IDNA2003 is the most important criterion and should be
	maintained going forward rather than being treated as a
	transition issue, then all mapping should be retained,
	including the Eszett-> "ss" mapping.
	
	(iii)  If one believes that IDNA2003 compatibility is a
	transition issue with a focus on reducing mapping as
	time goes by, then Eszett should be a character although
	various registries may find it advantageous to ban it at
	the registry level, resulting in the fallback mappings
	being applied to occurrences of it in lookup strings
	and, to all practical intents and purposes the IDNA2003
	behavior wrt that character.
	
	(iv) For completeness, if one believes that Eszett
	really isn't a character but should have been either
	left out of Unicode or treated as a compatibility
	character, not mapped via CaseFold, then the thing
	should be banned if we don't map and mapped if we do.
	While I don't believe anyone has taken that position, at
	least in the last year or so, I think I know where to
	find people who would take it.

My point was, more or less, that we understood all of those
options a year ago.  There has been a little bit of migration of
people among positions as the various discussions have gone back
and forth (and as views about anything other than the first are
plausible have evolved), but very little.  And, because there is
no universal "right" answer, I don't think the WG is getting
closer to a broad-consensus conclusion by going over and over
the arguments.

I don't know how to fix that, but I can observe the lack of
progress toward broad consensus.

> It is also not at all clear what the Greek NIC position is on
> sigma, based on the email. It appears that the tonos is more
> of an issue for them, and that for final sigma they may be
> more in line with the DENIC position.

I don't know whether that understanding is correct or not, but
it is consistent with mine.  Tonos is troubling for at least two
reasons:  It is an instance of a very large collection of
letters that could be described as containing
"possibly-ignorable decoration".  Should those characters match
the undecorated versions?  Maybe, but, agree with Jefsey or not,
one of his points is that there are all sorts of language and
locale-dependent considerations in answering the question,
considerations that do us no good with IDNA.  On the other hand,
if one were in an alternate reality in which upper-case
characters with tonos were never written, we might well have the
same CaseFold issues there that we have with FinalSigma and
Eszett.   Or perhaps we wouldn't but, in this reality, there is
no Unicode operation for CaseAndTonosFold or NFKCTt (NFKC and
Tonos too) so, unless we devise a set of IDNA-specific mappings
for those several characters, mappings that are incompatible
with IDNA2003, all we can do for .GR and the Tonos problem is to
sympathize.
 
> This has not at all been a waste of time; the 4 special cases
> are very problematic for compatibility and security, and
> winnowing them down would be a very good thing.

Again, I didn't say "waste of time".  It is just that I'm not
seeing much winnowing.  What I'm seeing instead is that, shortly
after we seem to have reached a conclusion, someone brings the
case up again and we start over.

>...

(Not ignoring the rest of your note, just don't have anything
useful to say right now that I haven't said many times before.)

best,
    john



More information about the Idna-update mailing list