<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix">On 1/24/2015 6:44 AM, Vint Cerf wrote:<br>

    </div>

    <blockquote

cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">I have been following this discussion with some

        interest and have come away with a thought that some of you may

        wish to refine or perhaps debate. Basically, I see the UNICODE

        effort as only partly aligned to the needs of the Internet's

        Domain name System </div>

    </blockquote>

    <br>

    Agreed, that is so, and by necessity. Unicode as the <b>universal </b>character

    set, cannot hope to be aligned perfectly with any single use case.

    And the DNS is one particular use case.<br>

    <br>

    <blockquote

cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">and the effort to use the UNICODE character

        parameters/descriptors/properties does not always line up with

        the desirable properties of the use of characters in the DNS. </div>

    </blockquote>

    <br>

    There is less of a restriction on Unicode properties. In principle,

    properties can be tailored to any problem domain or implementation.

    In fact, PVALID, is a character property, except one not specified

    by the Unicode Consortium. <br>

    <br>

    So, it's in principle not the case that no properties can be defined

    (whether by IETF or Unicode) that accommodate the needs of the DNS.<br>

    <br>

    <blockquote

cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">It seems to me useful to recall that domain names

        are identifiers that are not expected or even intended to follow

        purely linguistic constraints. They are used to create what are

        intended to be unique identifiers.</div>

    </blockquote>

    <br>

    ...that are reasonably mnemonic.<br>

    <br>

    Without the last qualifier, you'd not need IDNs.<br>

    <br>

    While mnemonics are often based on words or phrases of a given

    language, they are not identical to it, and not all linguistic

    conventions need apply. Definitely agree.<br>

    <br>

    There is, however, a clear pressure to make the system

    non-discriminatory; that is, to support basing mnemonics on all

    languages (or rather writing systems) with something like "equal

    ease". That drags in the full messiness of writing systems by the

    back door.<br>

    <br>

    <blockquote

cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"

      type="cite">

      <div dir="ltr"> Characters that have a high probability of looking

        the same but are encoded differently work against that goal. Of

        course I am fully aware of the confusability of the lower case

        letter "L" and the digit "ONE" (and "OH" and "ZERO") that is

        sometimes used as an example of the inconsistent toleration of

        confusion in the ASCII labels but I consider this to be an

        argument of the form "you allowed a case of confusion therefore

        you should tolerate all confusion". <br>

      </div>

    </blockquote>

    <br>

    There's accidental confusability and then there's confusability by

    design - and all the shades between them. Accidental confusability

    depends on issues of font size, font design and/or human perception

    (for example, the confusability between "rn" and "m"). Confusibility

    by design is based on issues of dual encoding, homographs and

    characters derivation and borrowing.<br>

    <br>

    Because of the pressure to allow mnemonics to be usable by users of

    other scripts, you inevitably drag in all the issues for these

    scripts (and, in the case of Latin, or Arabic, the issues that

    derive from having adapted these scripts to a multitude of

    orthographies).<br>

    <br>

    <blockquote

cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>I do wonder whether it is worth considering an attempt to

          create a new set of properties of UNICODED characters that are

          of specific use to the DNS. The IDNA 2008 work tried to use

          properties of characters developed for purposes other than the

          DNS and the fit is not always perfect. <br>

        </div>

      </div>

    </blockquote>

    <br>

    In principle the answer to that is yes. <br>

    <br>

    Unicode has discovered that the cleanest way to do many properties

    is to derive any new property from a combination of other properties

    where possible, and where not, to create exception lists. (Where the

    underlying properties are not immutable, the derivation gets checked

    each version, and exception lists can be re-generated to keep the

    derived property immutable. That's still less work, than maintaining

    an entirely separate property).<br>

    <br>

    That's more or less the path that's been followed for the IDNA2008

    specific properties.<br>

    <br>

    In that sense, your argument comes down to improving the IDNA208

    specific properties.<br>

    <br>

    I see one practical limitation in the fact that what is good for a

    stable and robust system of universal identifies will be at odds

    with the desire to provide mnemonics that work according to the

    expectations of specific sets of users (those expectations being

    based on the writing system, and the use thereof, that they are

    familiar with).<br>

    <br>

    As long as you cater to that on the protocol level, you run into the

    same kinds of "universality constraints" that Unicode runs into:

    some stuff needed for local support doesn't play well globally (and

    vice versa).<br>

    <br>

    Having just gone through that exercise, we've concluded that only

    about a third of all code points that are PVALID should even be

    considered for the Root Zone. The actual number that will come out

    of the more detailed investigations to follow will be smaller.<br>

    <br>

    In some cases, the restrictions imposed by that limitation will lead

    to exclusions that will look mighty arbitrary if seen through the

    lens of a local writing system. While it's not possible to render an

    English possessive in the DNS ("Barron's"), in some language we are

    proposing to not support the representation of plurals in the root.

    That's appropriate for the root, but I wonder very much whether it's

    appropriate to do something that drastic on the protocol level.<br>

    <br>

    And, as long as it isn't, it would represent a constraint on the

    kinds of properties you can design on the protocol level.<br>

    <br>

    In the case where two writing systems have conflicting demands, but

    where you don't want to pick one over the other, you need a

    different mechanism that essentially says: in each zone, you can

    have either one of these, but not both. And you want that mechanism

    as close to the protocol level as you can get.<br>

    <br>

    Having a robust way to define this mutual exclusion in a zone's IDN

    table (and perhaps backed up by an IDNA property that flags a code

    point or sequence as requiring such an exclusion to be defined)

    would seem to be an answer. In the root zone, we will have such a

    robust exclusion mechanism by the use of "blocked" variants.<br>

    <br>

    A./<br>

  </body>

</html>