Re-sending TXT form of Proposed IDNA2008 Transition Idea

Steve Crocker steve at shinkuro.com
Mon Dec 14 21:20:39 CET 2009


I would be interested in understanding this combinatorial explosion  
more clearly.  I was focused just on the sharp-s situation, and the  
expansion there is very slight, I believe.

Steve


On Dec 14, 2009, at 3:18 PM, Kane, Pat wrote:

> Steve,
>
> There could be billions of variants for a single registration.  We  
> used to have at least one IDN in .com that would had 16M variants.   
> We keep a separate variant table as opposed to registering the  
> variants themselves as domains.  Some Chinese characters have as  
> many as eight variants and the way that punycode compresses for  
> repeating characters you could end up with more than 20 Chinese  
> characters represented by the entire ASCII encoded string.
>
> If you repeated one of the characters with eight variants for seven  
> positions in a string, you would generate over 2 million variants.
>
> Pat
>
> From: idna-update-bounces at alvestrand.no [mailto:idna-update-bounces at alvestrand.no 
> ] On Behalf Of Steve Crocker
> Sent: Monday, December 14, 2009 2:37 PM
> To: Vint Cerf
> Cc: Steve Crocker; idna-update at alvestrand.no
> Subject: Re: Re-sending TXT form of Proposed IDNA2008 Transition Idea
>
> Vint, et al,
>
> This seems reasonable to me.  I would offer two refinements.
>
> First, each registry, in cooperation with its registrars, could use  
> the sunrise period to register all of the variants that are  
> automatically mapped together under IDNA2003 but will become  
> separate under IDNA2008.  The variants would all point to the same  
> address(es), so the result should be the same for anyone looking up  
> a name under either the IDNA2003 or IDNA2008 rules.  When the  
> sunrise period is over, the variants could become unregistered or  
> could be transferred to others. The existing registrant would have  
> first say, of course.  I'm implicitly suggesting a business strategy  
> for the registries and registrars, and that may or may not appeal to  
> them.  For ICANN accredited registries and registrars, there might  
> need to be some coordination with ICANN too, particularly if the  
> variant registrations are provided at no charge during the sunrise  
> period.  I haven't given this extensive thought, and I haven't  
> talked to others in ICANN, so I can't speak authoritatively, but it  
> seems to me a plausible strategy for smoothing the transition.  In  
> essence, this is the mirror of the strategy you're proposing in the  
> sense that all the variants are registered and then the undesired  
> ones trickle away.
>
> One might ask if the number of variants will be unwieldy, and one  
> can point to examples like Mississippi or hisssss... as stressful  
> cases.  My intuition is that even if a few cases explode, the  
> overall impact will be small.  I'll go on record here and suggest  
> the impact will be less than 10% for any existing domain.
>
> Second, you've proposed the timing of the transitions would be up to  
> each registry.  That's a good suggestion in terms of providing  
> maximum flexibility, but it seems to me that some of the timing is  
> governed by the browsers.  I would expect there will be a date when  
> IDNA2008 is phased in and a separate, later date when IDNA2003 is  
> declared dead.  In between these dates, I would expect the  
> registries will have to phase over.
>
> These details aside, I am very glad there is attention to a  
> transition plan.  That's something that has been a difficult area  
> for both IPv6 and DNSSEC, and I think IDNAbis will be much better  
> off with this attention to transition.
>
> Thanks,
>
> Steve
>
>
>
>
>
> On Dec 14, 2009, at 2:12 PM, Vint Cerf wrote:
>
>
> It is recommended to use a fixed width font to display this message
>
> Introduction of Eszett (sharp-S) and Final Sigma
>
> See http://typefoundry.blogspot.com/2008/01/esszett-or.html for an
> interesting perspective on 'Sharp-S'
>
>
> Introduction
>
> The IDNABIS working group has spent two years evolving documents
> describing the use of Unicode in Internet domain name labels. We have
> ended the IETF Last Call with a lengthy discussion on the manner in
> which the Unicode characters Latin Small Letter Sharp-S (U+00DF) and
> Greek Small Letter Final Sigma (U+03C2) are to be introduced into use.
> The so-called Zero-Width Joiner and Zero-Width Non-Joiner (ZWJ and
> ZWNJ respectively) have been included as CONTEXT-Joiner (or CONTEXTJ)
> in the IDNA2008 documentation and the general consensus is that these
> two may be registered at the discretion of registries. IDNA2008
> specifically permits their use, in context.
>
> The primary debates surrounding Sharp-S and Final Sigma relate to the
> method of their introduction into use as PVALID characters under
> IDNA2008. This note represents an attempt to synthesize a
> philosophical basis for achieving the goal of making these two
> characters usable in domain name labels.
>
> It is useful to recall that the Domain Name System is a hierarchical
> system of registries. The root zone is the place where top level
> domain labels are registered. The Top Level domain name registries
> (e.g. .com, .coop, .ca, .uk) are 'pointed to' using 'delegation
> records' in the root zone file. Each 'dot' in a domain name is a point
> where 'delegation' (in DNS-speak, a zone cut) for further registration
> handling MAY be implemented.
>
> So, for example, suppose that it is desired to create a Second Level
> label, 'foo' under the Top Level Label 'com'. Typically, the party
> wishing to register domain names with the suffix 'foo.com' would
> request to register 'foo' as a second level label under 'com' and a
> delegation record would be created pointing to the name server that
> will respond to all domain names with the suffix 'foo.com'.
>
> At any point, a registration may either be an address record for,
> e.g., abc.foo.com, or a set of delegation records pointing to the
> servers Third Level label 'abc'.
>
> The notion of delegation is important to keep in mind when considering
> how to introduce new PVALID characters into labels since each label in
> a multi-label domain name can be managed by a different entity (ie
> through delegated authority). A decision by a higher level authority
> to treat two different labels as equivalent is a non-trivial exercise
> in delegation mechanics. This fact is often lost in discussions about
> domain names as if there were flat identifiers. They are not. They
> really represent delegated hierarchies and their creation is often
> achieved through a series of assignments of delegated authority.
>
>
> DESIDERATA ON THE INTRODUCTION OF NEW PVALID CHARACTERS
>
>      1.  It is desirable that they can be introduced as soon as any
>      registry in the hierarchy wishes to do so without having to
>      coordinate with other registries.
>
>      2.  It is desirable that IDNA2003 compliant and IDNA2008  
> compliant
>      entities (programs, applications‚ etc.) co-exist without  
> introducing
>      ambiguous resolution of domain names (ie. The same domain name
>      resolves to different IP addresses under IDNA2003 and IDNA2008
>      interpretation)
>
>      3.  In the proposal that follows, a relaxation of the constraint
>      in (2) is that it is acceptable that IDNA2008 interpretation  
> leads
>      to NXDOMAIN even if IDNA2003 leads to a valid IP address (or
>      vice-versa).  Under this provision, the introduction of a new
>      PVALID character does not lead to distinct IP addresses (and
>      therefore hazardous ambiguity) even if it produces (temporary?)
>      non-resolution for some cases.
>
> It should be recognized that the millions of registries/zones in the
> DNS are largely independent entities.  We can produce a "suggested
> good practice", but registries will make local determinations as to
> what to do based on local considerations.  To discourage a particular
> practice, it seems best to explain what bad consequences will result
> from following it but as a practical matter leave the decisions up to
> the registry.  In many ways we have already adopted this position in
> IDNA2008 by leaving a great many decisions about which characters to
> permit for registration (even if they are PVALID in protocol) for
> reasons of local significance or practice.
>
> There are many side-effects associated with introducing as PVALID
> characters that were formerly mapped under IDNA2003. An unknown number
> of URLs (or other domain-name-referencing constructs) may become
> unreachable upon adoption of IDNA2008, if the unmapped versions of the
> associated domain names have not been constructively registered and
> made to resolve to the same IP address as the mapped version.
>
>
> THE SHARP-S EXAMPLE
>
> Under IDNA2003, any reference to a domain name label containing
> Sharp-S is converted to a label containing 'ss' in place of Sharp-S,
> whereever Sharp-S appears. This revised label is then used either for
> registration or look up in the Domain Name System.
>
> Under IDNA2008, Sharp-S is treated as PVALID and not converted to
> 'ss'.
>
> Many of the suggested transition tactics have attempted a kind of
> "perfection" in which there is either a deadline by which everything
> works under IDNA2008 or new mechanisms to somehow distinguish between
> IDNA2003 and IDNA2008 or urge strenuous efforts to make everything
> backward compatible with IDNA2003 mappings - especially for the two
> problem characters Sharp-S and Final Sigma. I am ignoring everything
> else but these in this contribution since my sense is that this
> working group may go along with anything that "solves" the problem
> with them. Joiners I think we can assume have been accepted in the
> CONTEXTJ form.
>
> I would like to try out on you an idea that isn't "perfect" but that
> avoids the worst hazard, I think.
>
> My definition of worst hazard is that different entities (browsers,
> applications) do resolution and get conflicting results.
>
> An example of this would be a case where under IDNA2003, a domain name
> containing Sharp-S would be vectored to a domain name and associated
> IP address that referenced a domain name registered with "ss" in lieu
> of Sharp-S and under IDNA2008 would be vectored to an IP address
> associated with a Sharp-S registration that leads to a different IP
> address and a distinct registrant. I would distinguish this from the
> case where the same registered domain name is associated with two or
> more IP addresses on purpose (e.g. two A records that the registrant
> considers equivalent).
>
>
> IDNA2003 Case
>
> registered   looked up
> domain name  domain name       IP address      Registrant
> masse.com    maße.com mapped   12.34.56.78     Mr. Foo
>              to masse.com
>
>
> IDNA2008 Case
>
> registered   looked up
> domain name  domain name       IP address      Registrant
> maße.com     maße.com          34.56.78.12     Mr. Bar
>
> The hazard is that under IDNA2003, a look up for maße.com gets the
> 12.34.56.78 address of masse.com while under IDNA2008, the look up for
> maße.com gets the 34.56.78.12 address of maße.com
>
> What we would like is to prevent this unexpected ambiguity.
>
> I would like to introduce a failsafe practice that prevents this
> particular ambiguity but allows for an NXDOMAIN result that may not be
> considered hazardous even it is annoying.
>
> Let us imagine that the .com registry wishes to introduce IDNA2008
> capability into its second level domain registrations (that's all it
> controls).
>
> We assume that it has been registering under IDNA2003 rules in the
> past, so that any label containing "ß" will have been mapped to "ss"
> prior to registration. There is a collection of registrants in the
> equivalence class "registered a label containing 'ss'". Let us call
> the set of such registrants R.
>
> The .com registry introduces a sunrise period in which all members of
> R are advised that they may register domains equivalent to the ones
> they did register but with the mapped "ss" form changed to the
> unmapped "ß" form. I am pretty sure there cannot be collisions here
> because all the final registrations have to have been mapped to "ss" -
> so if there were going to be a collision it would already have been
> detected at the time of original IDNA2003-compliant registration:
> "sorry, someone else has already registered the 'ss' form you would
> have gotten, can't register that."
>
> After time T (determined by the registry, not by IETF or ICANN fiat),
> the .com registry then advises that it will accept registration of
> SLDs containing "ß". However, it abides by the following rules at
> REGISTRATION time:
>
> (Failsafe Rule 1): If registration of an SLD containing "ß" would
> collide under IDNA2003 mapping rules with an existing registered
> domain name, the registration is allowed if the holder of the
> requested domain is the same (*) as the holder of the
> already-registered domain, otherwise the registration is not allowed.
>
> (Failsafe Rule 2): If registration of an SLD containing "ss" would
> collide under IDNA2003 mapping rules with an existing registered
> domain name containing "ß" it is allowed if the holder of the
> requested domain is the same (*) as the holder of the already
> registered domain, otherwise the registration not allowed. Note that
> Failsafe rule 2 only applies once a registry is operating under
> IDNA2008 rules.
>
>      (*) Which registrants are "the same" is to be defined by the
>      registry, and match the definitions the registry applies.
>
> As a slightly less safe alternative, but at the option of the registry
> (perhaps after even more time has gone by), "not allowed" in the above
> two rules could be replaced by notification of the existing domain
> holder with an offer to again let that registrant preemptively
> register the name, thereby blocking its registration by someone else.
> If that offer were not accepted, the new registration would be
> permitted, of course still subject to whatever dispute resolution
> policies are in effect for .com or other relevant zone.
>
> This latter suggestion opens the door for achieving independence of
> formerly-mapped pairs of now PVALID characters.
>
> There are some nuances to the scenarios offered above. With possible
> exceptions for some "bundling" practices, most registrations will be
> sequential (ie. not "at the same time"). One typically registers one
> domain name and then registers others. Because of this, we will
> usually end up in a situation where at the time of the second (or Nth)
> registration someone has to check, for example, whether the requested
> holder of the next domain name registered is the same holder as the
> holder of earlier but colliding registered domain names.
>
> There may be different registrars involved in sequential
> registrations. There may be different contact representatives for
> respective registrations. There might be transfers being made in
> between related registrations.
>
> Because of this, the important things are the failsafe rules, and that
> they (in an ICANN context) are formulated by the registries so that
> details like "same" actually have some specific meaning in the
> specific registry context.
>
> If we go back to the example given above and assume that Mr. Foo has
> registered masse.com before Mr. Bar has entered the picture, Mr. Foo
> will get to register maße.com during the sunrise period. Mr. Bar will
> not be allowed to register either maße.com or masse.com because both
> of these collide with previously registered domain names.
>
> Let us now suppose that after the sunrise period, the registry is
> operating under IDNA2008 rules. Let us suppose that someone, Mr. Baz,
> has registered "strasse.com" prior to the adoption of the IDNA2008
> rules. Let us also assume that he did not bother to register
> "straße.com" during the sunrise period (if he had, he would presumably
> have that registration too).
>
> Now let Mr. Frotz try to register "straße.com" - under Failsafe Rule
> 1, he would be denied this registration. Mr. Baz still has the
> possibility of registering it.
>
> If someone looks up "straße.com" under IDNA2003-compliant rules, he
> will get "strasse.com" unambiguously.
>
> If someone looks up "straße.com" under IDNA2008-compliant rules, he
> will get NXDOMAIN. This is a kind of brokenness but perhaps this is
> tolerable if it does not steer the party to the "wrong" site - and it
> potentially allows Mr. Baz to recover from his earlier choice not to
> register the "ß" version of his SLD earlier.
>
> Now let us suppose that "strasse.com" has NOT been registered at all,
> the sunrise happens, and we are now operating under IDNA2008 rules.
>
> Mr. Frotz registers "straße.com". Since there is no collision with a
> previously registered "strasse.com" there is no problem. Let us
> suppose that Mr. Frotz does not bother to register "strasse.com".
>
> If someone looks up "straße.com" under IDNA2003-compliant rules, he
> will get NXDOMAIN because "strasse.com" does not exist.
>
> If someone looks up "straße.com" under IDNA2008-compliant rules, he
> will get the corresponding IP address.
>
> If someone looks up "strasse.com" under IDNA2008-compliant rules, he
> will get NXDOMAIN because it has not been registered.
>
> Because the registry is operating under IDNA2008-rules, "ß" and "ss"
> are considered distinct and the party using IDNA2003-rules to look up
> a domain name registered under IDNA2008 rules is getting a "correct"
> response in some sense (in this case, NXDOMAIN). At least the lookup
> does not lead to the "wrong IP address".
>
> If Mr. Frotz registers both "strasse.com" and "straße.com" (assuming
> neither of these violates Failsafe Rules (1) and (2) at registration
> time), his registrations will work for both IDNA2003-compliant and
> IDNA2008-compliant lookups.  Whether queries using the two strings
> will produce the same results or not will still be up to him and not
> the registry: there is no practical way to avoid that.
>
> Let us suppose, again, that Mr. Frotz successfully registers
> "straße.com" under IDNA2008 rules but does not bother to register
> "strasse.com"
>
> Now let us suppose that Mr. FUBAR tries to register "strasse.com"
> subsequent to Mr. Frotz's registration of "straße.com". When he tries
> to do this, he would be blocked from that registration under Failsafe
> Rule (2).  Or, under the more permissive variation, Mr. Frotz would
> have an additional opportunity to block Mr. FUBAR's registration by
> registering "strasse.com" himself.
>
> I believe that adoption of Failsafe Rules (1) and (2) would permit
> each registry (in the general sense - all levels) to introduce
> IDNA2008 rules whenever they wish, and to provide for sunrise time
> periods of their choosing. The failures that occur (NXDOMAIN) are not
> harmful in the same way that "wrong IP address" would be harmful and
> perhaps this form of "failure" would be an acceptable price to pay for
> some period of time when IDNA2003-compliant and IDNA2008-compliant
> systems were in concurrent operation.
>
> I hope this isn't completely nuts.
>
>  vint
>
> from John Klensin:
>
> The suggested process could be used to create a five-stage process:
>
>    (1) No registrations that actually involve Sharp-S (the status quo)
>
>    (2) Sunrise -- priority registrations for Sharp-S those who already
>        have labels containing "ss".
>
>    (3) No possibly-conflicting registrations, using Failsafe Rules 1
>        and 2 as written; starting time to be determined by registry
>
>    (4) Possibly-conflicting registrations permitted only after the
>        original registrant gets notification and an additional
>        opportunity to register the name herself; starting date again
>        determined by the registry
>
>    (5) Sharp-S is just another character with no special treatment;
>        starting date again determined by the registry.
>
>
>
>
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20091214/6f21d15e/attachment-0001.htm 


More information about the Idna-update mailing list