Re-sending TXT form of Proposed IDNA2008 Transition Idea

Kane, Pat pkane at verisign.com
Mon Dec 14 21:18:18 CET 2009


Steve,

 

There could be billions of variants for a single registration.  We used to have at least one IDN in .com that would had 16M variants.  We keep a separate variant table as opposed to registering the variants themselves as domains.  Some Chinese characters have as many as eight variants and the way that punycode compresses for repeating characters you could end up with more than 20 Chinese characters represented by the entire ASCII encoded string.

 

If you repeated one of the characters with eight variants for seven positions in a string, you would generate over 2 million variants.  

 

Pat

 

________________________________

From: idna-update-bounces at alvestrand.no [mailto:idna-update-bounces at alvestrand.no] On Behalf Of Steve Crocker
Sent: Monday, December 14, 2009 2:37 PM
To: Vint Cerf
Cc: Steve Crocker; idna-update at alvestrand.no
Subject: Re: Re-sending TXT form of Proposed IDNA2008 Transition Idea

 

Vint, et al,

 

This seems reasonable to me.  I would offer two refinements.

 

First, each registry, in cooperation with its registrars, could use the sunrise period to register all of the variants that are automatically mapped together under IDNA2003 but will become separate under IDNA2008.  The variants would all point to the same address(es), so the result should be the same for anyone looking up a name under either the IDNA2003 or IDNA2008 rules.  When the sunrise period is over, the variants could become unregistered or could be transferred to others. The existing registrant would have first say, of course.  I'm implicitly suggesting a business strategy for the registries and registrars, and that may or may not appeal to them.  For ICANN accredited registries and registrars, there might need to be some coordination with ICANN too, particularly if the variant registrations are provided at no charge during the sunrise period.  I haven't given this extensive thought, and I haven't talked to others in ICANN, so I can't speak authoritatively, but it seems to me a plausible strategy for smoothing the transition.  In essence, this is the mirror of the strategy you're proposing in the sense that all the variants are registered and then the undesired ones trickle away.

 

One might ask if the number of variants will be unwieldy, and one can point to examples like Mississippi or hisssss... as stressful cases.  My intuition is that even if a few cases explode, the overall impact will be small.  I'll go on record here and suggest the impact will be less than 10% for any existing domain.

 

Second, you've proposed the timing of the transitions would be up to each registry.  That's a good suggestion in terms of providing maximum flexibility, but it seems to me that some of the timing is governed by the browsers.  I would expect there will be a date when IDNA2008 is phased in and a separate, later date when IDNA2003 is declared dead.  In between these dates, I would expect the registries will have to phase over.

 

These details aside, I am very glad there is attention to a transition plan.  That's something that has been a difficult area for both IPv6 and DNSSEC, and I think IDNAbis will be much better off with this attention to transition.

 

Thanks,

 

Steve

 

 

 

 

 

On Dec 14, 2009, at 2:12 PM, Vint Cerf wrote:





It is recommended to use a fixed width font to display this message

 

Introduction of Eszett (sharp-S) and Final Sigma

 

See http://typefoundry.blogspot.com/2008/01/esszett-or.html for an

interesting perspective on 'Sharp-S'

 

 

Introduction

 

The IDNABIS working group has spent two years evolving documents

describing the use of Unicode in Internet domain name labels. We have

ended the IETF Last Call with a lengthy discussion on the manner in

which the Unicode characters Latin Small Letter Sharp-S (U+00DF) and

Greek Small Letter Final Sigma (U+03C2) are to be introduced into use.

The so-called Zero-Width Joiner and Zero-Width Non-Joiner (ZWJ and

ZWNJ respectively) have been included as CONTEXT-Joiner (or CONTEXTJ)

in the IDNA2008 documentation and the general consensus is that these

two may be registered at the discretion of registries. IDNA2008

specifically permits their use, in context.

 

The primary debates surrounding Sharp-S and Final Sigma relate to the

method of their introduction into use as PVALID characters under

IDNA2008. This note represents an attempt to synthesize a

philosophical basis for achieving the goal of making these two

characters usable in domain name labels. 

 

It is useful to recall that the Domain Name System is a hierarchical

system of registries. The root zone is the place where top level

domain labels are registered. The Top Level domain name registries

(e.g. .com, .coop, .ca, .uk) are 'pointed to' using 'delegation

records' in the root zone file. Each 'dot' in a domain name is a point

where 'delegation' (in DNS-speak, a zone cut) for further registration

handling MAY be implemented.

 

So, for example, suppose that it is desired to create a Second Level

label, 'foo' under the Top Level Label 'com'. Typically, the party

wishing to register domain names with the suffix 'foo.com' would

request to register 'foo' as a second level label under 'com' and a

delegation record would be created pointing to the name server that

will respond to all domain names with the suffix 'foo.com'.

 

At any point, a registration may either be an address record for,

e.g., abc.foo.com, or a set of delegation records pointing to the

servers Third Level label 'abc'.

 

The notion of delegation is important to keep in mind when considering

how to introduce new PVALID characters into labels since each label in

a multi-label domain name can be managed by a different entity (ie

through delegated authority). A decision by a higher level authority

to treat two different labels as equivalent is a non-trivial exercise

in delegation mechanics. This fact is often lost in discussions about

domain names as if there were flat identifiers. They are not. They

really represent delegated hierarchies and their creation is often

achieved through a series of assignments of delegated authority.

 

 

DESIDERATA ON THE INTRODUCTION OF NEW PVALID CHARACTERS

 

     1.  It is desirable that they can be introduced as soon as any

     registry in the hierarchy wishes to do so without having to

     coordinate with other registries.

 

     2.  It is desirable that IDNA2003 compliant and IDNA2008 compliant

     entities (programs, applications' etc.) co-exist without introducing

     ambiguous resolution of domain names (ie. The same domain name

     resolves to different IP addresses under IDNA2003 and IDNA2008

     interpretation)

 

     3.  In the proposal that follows, a relaxation of the constraint

     in (2) is that it is acceptable that IDNA2008 interpretation leads

     to NXDOMAIN even if IDNA2003 leads to a valid IP address (or

     vice-versa).  Under this provision, the introduction of a new

     PVALID character does not lead to distinct IP addresses (and

     therefore hazardous ambiguity) even if it produces (temporary?)

     non-resolution for some cases.

 

It should be recognized that the millions of registries/zones in the

DNS are largely independent entities.  We can produce a "suggested

good practice", but registries will make local determinations as to

what to do based on local considerations.  To discourage a particular

practice, it seems best to explain what bad consequences will result

from following it but as a practical matter leave the decisions up to

the registry.  In many ways we have already adopted this position in

IDNA2008 by leaving a great many decisions about which characters to

permit for registration (even if they are PVALID in protocol) for

reasons of local significance or practice.

 

There are many side-effects associated with introducing as PVALID

characters that were formerly mapped under IDNA2003. An unknown number

of URLs (or other domain-name-referencing constructs) may become

unreachable upon adoption of IDNA2008, if the unmapped versions of the

associated domain names have not been constructively registered and

made to resolve to the same IP address as the mapped version. 

 

 

THE SHARP-S EXAMPLE

 

Under IDNA2003, any reference to a domain name label containing

Sharp-S is converted to a label containing 'ss' in place of Sharp-S,

whereever Sharp-S appears. This revised label is then used either for

registration or look up in the Domain Name System.

 

Under IDNA2008, Sharp-S is treated as PVALID and not converted to

'ss'.

 

Many of the suggested transition tactics have attempted a kind of

"perfection" in which there is either a deadline by which everything

works under IDNA2008 or new mechanisms to somehow distinguish between

IDNA2003 and IDNA2008 or urge strenuous efforts to make everything

backward compatible with IDNA2003 mappings - especially for the two

problem characters Sharp-S and Final Sigma. I am ignoring everything

else but these in this contribution since my sense is that this

working group may go along with anything that "solves" the problem

with them. Joiners I think we can assume have been accepted in the

CONTEXTJ form.

 

I would like to try out on you an idea that isn't "perfect" but that

avoids the worst hazard, I think.

 

My definition of worst hazard is that different entities (browsers,

applications) do resolution and get conflicting results. 

 

An example of this would be a case where under IDNA2003, a domain name

containing Sharp-S would be vectored to a domain name and associated

IP address that referenced a domain name registered with "ss" in lieu

of Sharp-S and under IDNA2008 would be vectored to an IP address

associated with a Sharp-S registration that leads to a different IP

address and a distinct registrant. I would distinguish this from the

case where the same registered domain name is associated with two or

more IP addresses on purpose (e.g. two A records that the registrant

considers equivalent). 

 

 

IDNA2003 Case

 

registered   looked up

domain name  domain name       IP address      Registrant

masse.com    maße.com mapped   12.34.56.78     Mr. Foo

             to masse.com

 

 

IDNA2008 Case

 

registered   looked up

domain name  domain name       IP address      Registrant

maße.com     maße.com          34.56.78.12     Mr. Bar

 

The hazard is that under IDNA2003, a look up for maße.com gets the

12.34.56.78 address of masse.com while under IDNA2008, the look up for

maße.com gets the 34.56.78.12 address of maße.com

 

What we would like is to prevent this unexpected ambiguity.

 

I would like to introduce a failsafe practice that prevents this

particular ambiguity but allows for an NXDOMAIN result that may not be

considered hazardous even it is annoying.

 

Let us imagine that the .com registry wishes to introduce IDNA2008

capability into its second level domain registrations (that's all it

controls).

 

We assume that it has been registering under IDNA2003 rules in the

past, so that any label containing "ß" will have been mapped to "ss"

prior to registration. There is a collection of registrants in the

equivalence class "registered a label containing 'ss'". Let us call

the set of such registrants R.

 

The .com registry introduces a sunrise period in which all members of

R are advised that they may register domains equivalent to the ones

they did register but with the mapped "ss" form changed to the

unmapped "ß" form. I am pretty sure there cannot be collisions here

because all the final registrations have to have been mapped to "ss" -

so if there were going to be a collision it would already have been

detected at the time of original IDNA2003-compliant registration:

"sorry, someone else has already registered the 'ss' form you would

have gotten, can't register that." 

 

After time T (determined by the registry, not by IETF or ICANN fiat),

the .com registry then advises that it will accept registration of

SLDs containing "ß". However, it abides by the following rules at

REGISTRATION time:

 

(Failsafe Rule 1): If registration of an SLD containing "ß" would

collide under IDNA2003 mapping rules with an existing registered

domain name, the registration is allowed if the holder of the

requested domain is the same (*) as the holder of the

already-registered domain, otherwise the registration is not allowed. 

 

(Failsafe Rule 2): If registration of an SLD containing "ss" would

collide under IDNA2003 mapping rules with an existing registered

domain name containing "ß" it is allowed if the holder of the

requested domain is the same (*) as the holder of the already

registered domain, otherwise the registration not allowed. Note that

Failsafe rule 2 only applies once a registry is operating under

IDNA2008 rules.

 

     (*) Which registrants are "the same" is to be defined by the

     registry, and match the definitions the registry applies. 

 

As a slightly less safe alternative, but at the option of the registry

(perhaps after even more time has gone by), "not allowed" in the above

two rules could be replaced by notification of the existing domain

holder with an offer to again let that registrant preemptively

register the name, thereby blocking its registration by someone else.

If that offer were not accepted, the new registration would be

permitted, of course still subject to whatever dispute resolution

policies are in effect for .com or other relevant zone.

 

This latter suggestion opens the door for achieving independence of

formerly-mapped pairs of now PVALID characters. 

 

There are some nuances to the scenarios offered above. With possible

exceptions for some "bundling" practices, most registrations will be

sequential (ie. not "at the same time"). One typically registers one

domain name and then registers others. Because of this, we will

usually end up in a situation where at the time of the second (or Nth)

registration someone has to check, for example, whether the requested

holder of the next domain name registered is the same holder as the

holder of earlier but colliding registered domain names.

 

There may be different registrars involved in sequential

registrations. There may be different contact representatives for

respective registrations. There might be transfers being made in

between related registrations. 

 

Because of this, the important things are the failsafe rules, and that

they (in an ICANN context) are formulated by the registries so that

details like "same" actually have some specific meaning in the

specific registry context. 

 

If we go back to the example given above and assume that Mr. Foo has

registered masse.com before Mr. Bar has entered the picture, Mr. Foo

will get to register maße.com during the sunrise period. Mr. Bar will

not be allowed to register either maße.com or masse.com because both

of these collide with previously registered domain names.

 

Let us now suppose that after the sunrise period, the registry is

operating under IDNA2008 rules. Let us suppose that someone, Mr. Baz,

has registered "strasse.com" prior to the adoption of the IDNA2008

rules. Let us also assume that he did not bother to register

"straße.com" during the sunrise period (if he had, he would presumably

have that registration too). 

 

Now let Mr. Frotz try to register "straße.com" - under Failsafe Rule

1, he would be denied this registration. Mr. Baz still has the

possibility of registering it. 

 

If someone looks up "straße.com" under IDNA2003-compliant rules, he

will get "strasse.com" unambiguously.

 

If someone looks up "straße.com" under IDNA2008-compliant rules, he

will get NXDOMAIN. This is a kind of brokenness but perhaps this is

tolerable if it does not steer the party to the "wrong" site - and it

potentially allows Mr. Baz to recover from his earlier choice not to

register the "ß" version of his SLD earlier.

 

Now let us suppose that "strasse.com" has NOT been registered at all,

the sunrise happens, and we are now operating under IDNA2008 rules.

 

Mr. Frotz registers "straße.com". Since there is no collision with a

previously registered "strasse.com" there is no problem. Let us

suppose that Mr. Frotz does not bother to register "strasse.com". 

 

If someone looks up "straße.com" under IDNA2003-compliant rules, he

will get NXDOMAIN because "strasse.com" does not exist.

 

If someone looks up "straße.com" under IDNA2008-compliant rules, he

will get the corresponding IP address.

 

If someone looks up "strasse.com" under IDNA2008-compliant rules, he

will get NXDOMAIN because it has not been registered.

 

Because the registry is operating under IDNA2008-rules, "ß" and "ss"

are considered distinct and the party using IDNA2003-rules to look up

a domain name registered under IDNA2008 rules is getting a "correct"

response in some sense (in this case, NXDOMAIN). At least the lookup

does not lead to the "wrong IP address".

 

If Mr. Frotz registers both "strasse.com" and "straße.com" (assuming

neither of these violates Failsafe Rules (1) and (2) at registration

time), his registrations will work for both IDNA2003-compliant and

IDNA2008-compliant lookups.  Whether queries using the two strings

will produce the same results or not will still be up to him and not

the registry: there is no practical way to avoid that.

 

Let us suppose, again, that Mr. Frotz successfully registers

"straße.com" under IDNA2008 rules but does not bother to register

"strasse.com"

 

Now let us suppose that Mr. FUBAR tries to register "strasse.com"

subsequent to Mr. Frotz's registration of "straße.com". When he tries

to do this, he would be blocked from that registration under Failsafe

Rule (2).  Or, under the more permissive variation, Mr. Frotz would

have an additional opportunity to block Mr. FUBAR's registration by

registering "strasse.com" himself. 

 

I believe that adoption of Failsafe Rules (1) and (2) would permit

each registry (in the general sense - all levels) to introduce

IDNA2008 rules whenever they wish, and to provide for sunrise time

periods of their choosing. The failures that occur (NXDOMAIN) are not

harmful in the same way that "wrong IP address" would be harmful and

perhaps this form of "failure" would be an acceptable price to pay for

some period of time when IDNA2003-compliant and IDNA2008-compliant

systems were in concurrent operation.

 

I hope this isn't completely nuts.

 

 vint

 

from John Klensin:

 

The suggested process could be used to create a five-stage process:

 

   (1) No registrations that actually involve Sharp-S (the status quo)

 

   (2) Sunrise -- priority registrations for Sharp-S those who already

       have labels containing "ss".

 

   (3) No possibly-conflicting registrations, using Failsafe Rules 1

       and 2 as written; starting time to be determined by registry 

 

   (4) Possibly-conflicting registrations permitted only after the

       original registrant gets notification and an additional

       opportunity to register the name herself; starting date again

       determined by the registry

 

   (5) Sharp-S is just another character with no special treatment;

       starting date again determined by the registry.  

 

 

 

 

 

 

 

 

_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20091214/6c6317e4/attachment-0001.htm 


More information about the Idna-update mailing list