<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 1/24/2015 5:15 PM, Shawn Steele

      wrote:<br>

    </div>

    <blockquote

cite="mid:CY1PR0301MB0731B01A94DD3DE4BB0865B682340@CY1PR0301MB0731.namprd03.prod.outlook.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <meta name="Generator" content="Microsoft Word 15 (filtered

        medium)">

      <style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:"Shonar Bangla";

        panose-1:2 11 5 2 4 2 4 2 2 3;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Times New Roman",serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

span.EmailStyle17

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:#1F497D;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-family:"Calibri",sans-serif;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

      <div class="WordSection1">

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">As

            long as we’re being very open about the identifiers, I think

            that DNS may have been intended to be unique identifiers,

            but they have evolved into human readable (for the most

            part) identifiers.  If they were “just” unique, a bunch if

            #s would’ve sufficed.  Clearly now they are not just unique

            identifiers, but also cater to linguistic behavior.</span></p>

      </div>

    </blockquote>

    <br>

    They are reasonably mnemonic, without being subject in all instances

    to the same rules as actual words or phrases.<br>

    <br>

    <blockquote

cite="mid:CY1PR0301MB0731B01A94DD3DE4BB0865B682340@CY1PR0301MB0731.namprd03.prod.outlook.com"

      type="cite">

      <div class="WordSection1">

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">I

            think that the important part of the name resolution isn’t

            whether or not certain characters are “allowed”, but rather

            that they resolve to the same thing (eg: they’re

            identifiers). <br>

          </span></p>

      </div>

    </blockquote>

    <br>

    There are at least two flavors of "allowed" here.<br>

    <br>

    One is whether a code point is permitted by the protocol, or,

    perhaps permitted in certain contexts. The protocol addresses this

    in a black & white manner, globally.<br>

    <br>

    The other is, whether two labels may exist, that differ only by two,

    otherwise confusable (or homograph) code points/sequences.<br>

    <br>

    Here, you have two basic options.<br>

    <br>

    You can set up an exclusion mechanism. Once one of the labels has

    been registered, the other can no longer be registered. (In some

    contexts, these are called "blocked variants"). This mechanism works

    fine for a whole lot of scenarios. It doesn't a-priori elminate any

    of the variants, so if one language needs one, while another

    language needs the other, you can have users of both languages

    compete normally for the available name space, without allowing

    malicious or accidental spoofing. Such an exclusion mechanism, if

    mechanically applied (without case-by-case review and/or appeals),

    is a robust method to manage such contentions. It has the further

    advantage that it impacts only registration of labels, not their

    lookup.<br>

    <br>

    The other option is the one you describe:<br>

    <br>

    <blockquote

cite="mid:CY1PR0301MB0731B01A94DD3DE4BB0865B682340@CY1PR0301MB0731.namprd03.prod.outlook.com"

      type="cite">

      <div class="WordSection1">

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">

            I don’t think that it’s important that DNS support all

            possible combinations, but that where names are resolved

            that they are consistent.  Currently 5 names can resolve to

            the same IP, and I don’t see a problem with that.  So I

            think that it should be totally possible for the

            “confusable” characters to merely resolve to the same

            thing.  Eg: be bundled.  Sure, then people can’t register

            some names that use similar letters (or variations), but

            then it isn’t confusing.  Also you have a round-tripping

            problem because if 5 names resolve to the same thing, which

            do you display?  </span></p>

      </div>

    </blockquote>

    <br>

    this kind of bundling is called "allocatable variants" in some

    contexts. They can be appropriate where there is a reasonable

    expectation that some users would use one, and other users would use

    one of the other variants in a bundle to access the same IP. Either,

    because users normally don't make the distinction reliably enough,

    or because depending on system configuration etc. they may normally

    not be able to input one of the variants. There are examples in

    Arabic and Chinese where this kind of thing is done today, and for

    good reason.<br>

    <br>

    However, the downside of this approach is that you can quickly get a

    very large number of variant labels (especially if the label is

    long) because variant code points could appear in many positions

    (and even the set of variant code points at a given position could

    be larger than just 2 or 3).<br>

    <br>

    When you work this out for the FQDN, the number of names for the

    same IP could be interestingly large. Also, since there's no way to

    enforce this, you may not actually end at the same IP. But at least,

    as long as the bundle goes to the same registrant, it would present

    a block to malicious spoofing by a third party.<br>

    <br>

    In the case we are discussing here (the one that lead IETF to delay

    the IDNA tables for Unicode 7.0), I see no case for doing something

    like a bundle. There simply isn't the expectation that some users

    would regularly use the code point sequence to input the label. In

    fact, normally, if you did anything on the protocol level it would

    be a context rule to disallow the sequence altogether (it's not

    really needed). However, it was there first, and all that, so on the

    protocol level you can't do anything, or nothing that wouldn't make

    the situation worse.<br>

    <br>

    Next best thing is to recommend that zone operators implement the

    kind of exclusion mechanism represented by 'blocked variants'.<br>

    <br>

    A./<br>

    <br>

    <blockquote

cite="mid:CY1PR0301MB0731B01A94DD3DE4BB0865B682340@CY1PR0301MB0731.namprd03.prod.outlook.com"

      type="cite">

      <div class="WordSection1">

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D">-Shawn<o:p></o:p></span></p>

        <p class="MsoNormal"><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p> </o:p></span></p>

        <p class="MsoNormal"><b><span

              style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif">

            Idna-update [<a class="moz-txt-link-freetext" href="mailto:idna-update-bounces@alvestrand.no">mailto:idna-update-bounces@alvestrand.no</a>]

            <b>On Behalf Of </b>Vint Cerf<br>

            <b>Sent:</b> Saturday, January 24, 2015 6:45 AM<br>

            <b>To:</b> Martin J. Dürst<br>

            <b>Cc:</b> John C Klensin; Asmus Freytag;

            <a class="moz-txt-link-abbreviated" href="mailto:idna-update@alvestrand.no">idna-update@alvestrand.no</a>; The IESG<br>

            <b>Subject:</b> Re: [Json] Json and U+08A1 and related cases<o:p></o:p></span></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <div>

          <p class="MsoNormal">I have been following this discussion

            with some interest and have come away with a thought that

            some of you may wish to refine or perhaps debate. Basically,

            I see the UNICODE effort as only partly aligned to the needs

            of the Internet's Domain name System and the effort to use

            the UNICODE character parameters/descriptors/properties does

            not always line up with the desirable properties of the use

            of characters in the DNS. It seems to me useful to recall

            that domain names are identifiers that are not expected or

            even intended to follow purely linguistic constraints. They

            are used to create what are intended to be unique

            identifiers. Characters that have a high probability of

            looking the same but are encoded differently work against

            that goal. Of course I am fully aware of the confusability

            of the lower case letter "L" and the digit "ONE" (and "OH"

            and "ZERO") that is sometimes used as an example of the

            inconsistent toleration of confusion in the ASCII labels but

            I consider this to be an argument of the form "you allowed a

            case of confusion therefore you should tolerate all

            confusion". <o:p></o:p></p>

          <div>

            <p class="MsoNormal"><o:p> </o:p></p>

          </div>

          <div>

            <p class="MsoNormal">I do wonder whether it is worth

              considering an attempt to create a new set of properties

              of UNICODED characters that are of specific use to the

              DNS. The IDNA 2008 work tried to use properties of

              characters developed for purposes other than the DNS and

              the fit is not always perfect. <o:p></o:p></p>

          </div>

          <div>

            <p class="MsoNormal"><o:p> </o:p></p>

          </div>

          <div>

            <p class="MsoNormal">vint<o:p></o:p></p>

          </div>

          <div>

            <p class="MsoNormal"><o:p> </o:p></p>

          </div>

        </div>

        <div>

          <p class="MsoNormal"><o:p> </o:p></p>

          <div>

            <p class="MsoNormal">On Fri, Jan 23, 2015 at 4:14 AM,

              "Martin J. Dürst" <<a moz-do-not-send="true"

                href="mailto:duerst@it.aoyama.ac.jp" target="_blank">duerst@it.aoyama.ac.jp</a>>

              wrote:<o:p></o:p></p>

            <blockquote style="border:none;border-left:solid #CCCCCC

              1.0pt;padding:0in 0in 0in

              6.0pt;margin-left:4.8pt;margin-right:0in">

              <p class="MsoNormal" style="margin-bottom:12.0pt">Hello

                Asmus,<br>

                <br>

                On 2015/01/22 11:58, Asmus Freytag wrote:<o:p></o:p></p>

              <blockquote style="border:none;border-left:solid #CCCCCC

                1.0pt;padding:0in 0in 0in

                6.0pt;margin-left:4.8pt;margin-right:0in">

                <p class="MsoNormal">I would go further, and claim that

                  the notion that "*all homographs are<br>

                  the**<br>

                  **same abstract character*" is *misplaced, if not

                  incorrect*.<o:p></o:p></p>

              </blockquote>

              <p class="MsoNormal" style="margin-bottom:12.0pt"><br>

                That's fine. Nobody would claim that 8 (U+0038) and <span

                  style="font-family:"Shonar

                  Bangla",sans-serif">

                  ৪</span> (Bengali 4, U+09EA) are the same abstract

                character. (How 'homographic' they look will depend on

                what fonts your mail user agent uses :-)<br>

                <br>

                <o:p></o:p></p>

              <blockquote style="border:none;border-left:solid #CCCCCC

                1.0pt;padding:0in 0in 0in

                6.0pt;margin-left:4.8pt;margin-right:0in">

                <p class="MsoNormal">U+08A1 is not the only character

                  that has a non-decomposable homograph, and<br>

                  because the encoding of it wasn't an accident, but

                  follows a principle<br>

                  applied<br>

                  by the Unicode Technical Committee, it won't, and

                  can't be the last<br>

                  instance of<br>

                  a non-decomposable homograph.<br>

                  <br>

                  The "failure of U+08A1 to have a (non-identity)

                  decomposition", while it<br>

                  perhaps<br>

                  complicates the design of a system of robust mnemonic

                  identifiers (such<br>

                  as IDNs)<br>

                  it appears not be be due to a "breakdown" of the

                  encoding process and<br>

                  also does<br>

                  not constitute a break of any encoding stability

                  promises  by the Unicode<br>

                  Consortium.<br>

                  <br>

                  Rather, it represents reasoned, and principled

                  judgment of what is or<br>

                  isn't the<br>

                  "same abstract character". That judgment has to be

                  made somewhere in the<br>

                  process, and the bodies responsible for character

                  encoding get to make the<br>

                  determination.<o:p></o:p></p>

              </blockquote>

              <p class="MsoNormal"><br>

                While I can agree with this characterization, many

                judgements on character encoding are by their very

                nature borderline, and U+08A1 definitely in many aspects

                is borderline. What I hope is that the Unicode Technical

                Committee, when making future, similar decisions,

                hopefully puts the borderline a bit more in support of

                applications such as identifiers, and a bit less in

                favor of splitting. Also, that it realize that when

                principles lead to more and more homograph encodings, it

                may very well pay off to reexamine some of these

                principles before going down a slippery slope.<br>

                <br>

                Regards,   Martin.<o:p></o:p></p>

              <div>

                <div>

                  <p class="MsoNormal"><br>

                    _______________________________________________<br>

                    Idna-update mailing list<br>

                    <a moz-do-not-send="true"

                      href="mailto:Idna-update@alvestrand.no"

                      target="_blank">Idna-update@alvestrand.no</a><br>

                    <a moz-do-not-send="true"

                      href="http://www.alvestrand.no/mailman/listinfo/idna-update"

                      target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><o:p></o:p></p>

                </div>

              </div>

            </blockquote>

          </div>

          <p class="MsoNormal"><o:p> </o:p></p>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Idna-update mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a>

<a class="moz-txt-link-freetext" href="http://www.alvestrand.no/mailman/listinfo/idna-update">http://www.alvestrand.no/mailman/listinfo/idna-update</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>