Another Transition Plan Proposal

Thu Dec 10 22:33:51 CET 2009

Gerv,

I think this is helpful as a way to think about the issues, but
am having problems thinking about how to try to write it up in
the context of the existing documents and about what it means in
practice.  As indicated below, I prefer it, for some fundamental
reasons, to the other strategies that have been outlined.
Comments inline below.

--On Thursday, December 10, 2009 09:59 -0800 Gervase Markham
<gerv at mozilla.org> wrote:

> Simon and I have been discussing the proposed transition
> plans, and have  a suggestion of our own. We apologise if this
> is similar to or the same  as an existing proposal; it's been
> very hard to keep up with the volume.
> 
> We assume the decision is to make sharp-S and final-sigma
> PVALID.
> 
> Goals (like Mark):
> 
> 1) Make the characters final-sigma and sharp-S usable ASAP.
> 2) Avoid as much as possible the situation where the same URL
> goes to  two locations in two different clients.

I note that you say "URL" here.  It identifies one of the issues
for me, which is the difference between the web and, e.g., email
-- not, this time, a difference in the protocols, but a
difference in deployment.  Although I think it will change
quickly over the next year, there is essentially no deployed
base for IDN email today.  ASCII-local-part at IDNA-server-domain
is feasible, but don't make a lot of sense in practice (although
I assume there are instances of it out there).  By contrast,
non-ASCII-local-part at IDNA-server-domain, where the local part
and expected handling doesn't conform to the experimental EAI
specs may be more common... but such systems are likely to have
other problems and still don't amount to high levels of
deployment.

A strategy that necessarily "sticks" mail-related URLs with
constraints that come mostly from web deployment may not be an
optimal strategy.  And it might be ignored as a result.

> and also:
> 
> 3) Avoid client complexity, and multiple network round trips
> for lookup.
> 
> Current implementation (IDNA2003): wieB.com is mapped by the
> client to  weiss.com, and then looked up.

> Phase 1: registries in the five key areas (Germany,
> Switzerland,  Austria, Greece and Cyprus) are requested to go
> through their  registrations and create secondary
> registrations for all sharp-S and  final-sigma variants,
> registered at no cost to the same registrants. (In  other
> words, bundling.) Other registries are encouraged to do this
> also,  but the plan only really depends on the cooperation of
> those five.

Final sigma in its normal context is fairly easy.  One might
sensibly infer that any Greek string obtained by
back-translating an existing A-label that has lower case sigma
as the last character was intended to be final sigma.  Because
of the word-joining problem, one cannot as safely make the
reverse assumption --that lower-case sigma inside a label was
not intended to be final sigma-- but perhaps it is close enough.

But Eszett is complex enough that I'm not sure that I even
understand what the above is proposing.  Other than with
dictionaries --which depend on a label actually being a word--
and sometimes not then, is it possible to deduce whether a
string containing two consecutive "s" characters was intended to
be "ss" or sharp-S.  So I don't know how, by looking at existing
registrations, one determines what a "sharp-S variant" is.  One
might be able to tell if the registrar recorded the registrant's
preference in terms of a native-character string, but there is
no guarantee that all registrars have done that, or that the
registries will have the information if they have.

And remember, as you point out below in a different context,
that, for example, "registry in Germany" doesn't refer to one
registry (DENIC) but to 13,318,939 second-level domains directly
under in .DE alone (number snarfed from their web site within
the last five minutes; certainly increased by now), all of their
delegated subdomains, etc.  Tens, probably hundreds, of millions
of operators, many of whom we probably can't reliably reach,
much less expect to read some IETF document.

> We understand that this group has no ability to force
> registries to do  anything. However, Shawn has said .de and
> .at have already indicated  intention to bundle, so hopefully
> this is the direction they would be  going anyway. We strongly
> suspect that .gr and .cy would bundle, given  the nature of
> the use of final-sigma. We would welcome feedback from the 
> other registries on their plans.

Not to be negative, but I am no longer certain what "bundle"
means because it has been used in many different ways on this
list.  In a way, that is an advantage, because I think giving
registries choices of which interpretation of "bundle" then
intend to use is an advantage as long as the general objectives
can be met.

Cary and others may be able to calibrate this better, but my
understanding from discussions with various ccTLD operators that
sunrise procedures have been much more effective, much more
widely used, and _far_ less complicated to administer than
JET-like variant procedures, especially when there is a
reasonable expectation that one "form" and another may diverge
over time.   Especially for cases like the Eszett one, where the
"traditional" form is an ASCII string and not an IDN at all,
sunrise procedures have a long history of being used to get
registries from Basic Latin-only to Basic Latin plus some number
of "extended" characters.  Such procedures have the advantage of
letting registrants self-identify whether what they really
intended was the Basic Latin form or the more extended/decorated
one.   But, from a different perspective, sunrise procedures
that are limited to those who could make a plausible claim for
the new character (registrants of strings containing "ss" in the
Sharp-S case), is just a different form of bundling.

Please also don't forget the fact that there are German and
Greek registrations in various gTLDs.  I don't know whether the
same principles would usefully apply, but I think we need to at
least consider the issue rather than assuming it is limited to
ccTLD registries whose primary language of interest is Greek or
German.

I also imagine, although local circumstances may differ, that,
if I were administrator far down in a subtree that expected lots
of Greek or German registrations, I might simply avoid
registering strings containing the new characters if there were
any possible conflicts until I was convinced that applications
software had been converted.  Again, blocking particular
characters, or blocking them in some contexts, is another form
of bundling because it recognize the possible relationships.

> We agree the TLD registries are not in control of all domain
> names, but  we think publicity (which both we, the registries,
> and perhaps  organizations with good stats about the
> distribution of such domain  names can cooperate on) and
> leading by example will inform DNS admins in  the affected
> language communities such that take-up is good.

I think that is possibly plausible at the second level.  I have
my doubts below it for reasons that should be clear from the
above.

> Phase 2: After a set period, and once they report back that
> this is done  (3 months? 6 months?), clients start to change
> their implementations.  Instead of mapping B to ss and then
> looking up the ss form, they look up  the B form directly.
> However, due to the bundling, all clients end up at  the same
> website. 

  ^^^^^^^
Please see above.

> There is no end-user impact except in the case of a 
> registry choosing that there be end-user impact - and they can
> be  responsible for the confusion which results if they do so.
> There is no  flag day, and clients can make these changes
> based on existing release  schedules.

> During this time, some clients map then look up, and others
> look up  directly. But, one hopes, all clients end up at the
> same place.

> Phase 3: once a sufficient percentage of clients are the
> updated ones,  registries have the freedom to unbundle if they
> choose to do so and it's  appropriate for their region's
> understanding of the meaning of the two  alternatives for the
> character. (We anticipate this would never happen  for
> final-sigma, but might happen in some areas for sharp-S.)
> Registries  would be entirely responsible for any confusion
> which resulted from  doing this.

As they are under any other strategy, for better or worse.

> It seems to us that there are two ways we can do this
> transition; by  making the client more complicated, or by
> asking registries to cooperate  in the interests of applying
> the Principle of Least Surprise to users.  There have been
> proposals in the first category; this one is in the  second.
> :-)

It has another advantage which is that it recognizes the fact
that, while end users may blame clients (and indexing services)
etc., no matter what goes wrong and what the actual problem is,
the registrants, trademark lawyers, and probably  the
advertisers will blame the registries.  A model that reflects
that situation, and all of the business models that are tied to
it, is ultimately more likely to be successful than one that
focuses purely on clients.

So, while I have all of the reservations outlined above, some
model that focuses on registry responsibility such as this one
seems to me like a much better way to go that one that tried to
make the clients bear all of the responsibility.

best,
    john