<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 1/24/2015 6:44 AM, Vint Cerf wrote:<br>
</div>
<blockquote
cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"
type="cite">
<div dir="ltr">I have been following this discussion with some
interest and have come away with a thought that some of you may
wish to refine or perhaps debate. Basically, I see the UNICODE
effort as only partly aligned to the needs of the Internet's
Domain name System </div>
</blockquote>
<br>
Agreed, that is so, and by necessity. Unicode as the <b>universal </b>character
set, cannot hope to be aligned perfectly with any single use case.
And the DNS is one particular use case.<br>
<br>
<blockquote
cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"
type="cite">
<div dir="ltr">and the effort to use the UNICODE character
parameters/descriptors/properties does not always line up with
the desirable properties of the use of characters in the DNS. </div>
</blockquote>
<br>
There is less of a restriction on Unicode properties. In principle,
properties can be tailored to any problem domain or implementation.
In fact, PVALID, is a character property, except one not specified
by the Unicode Consortium. <br>
<br>
So, it's in principle not the case that no properties can be defined
(whether by IETF or Unicode) that accommodate the needs of the DNS.<br>
<br>
<blockquote
cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"
type="cite">
<div dir="ltr">It seems to me useful to recall that domain names
are identifiers that are not expected or even intended to follow
purely linguistic constraints. They are used to create what are
intended to be unique identifiers.</div>
</blockquote>
<br>
...that are reasonably mnemonic.<br>
<br>
Without the last qualifier, you'd not need IDNs.<br>
<br>
While mnemonics are often based on words or phrases of a given
language, they are not identical to it, and not all linguistic
conventions need apply. Definitely agree.<br>
<br>
There is, however, a clear pressure to make the system
non-discriminatory; that is, to support basing mnemonics on all
languages (or rather writing systems) with something like "equal
ease". That drags in the full messiness of writing systems by the
back door.<br>
<br>
<blockquote
cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"
type="cite">
<div dir="ltr"> Characters that have a high probability of looking
the same but are encoded differently work against that goal. Of
course I am fully aware of the confusability of the lower case
letter "L" and the digit "ONE" (and "OH" and "ZERO") that is
sometimes used as an example of the inconsistent toleration of
confusion in the ASCII labels but I consider this to be an
argument of the form "you allowed a case of confusion therefore
you should tolerate all confusion". <br>
</div>
</blockquote>
<br>
There's accidental confusability and then there's confusability by
design - and all the shades between them. Accidental confusability
depends on issues of font size, font design and/or human perception
(for example, the confusability between "rn" and "m"). Confusibility
by design is based on issues of dual encoding, homographs and
characters derivation and borrowing.<br>
<br>
Because of the pressure to allow mnemonics to be usable by users of
other scripts, you inevitably drag in all the issues for these
scripts (and, in the case of Latin, or Arabic, the issues that
derive from having adapted these scripts to a multitude of
orthographies).<br>
<br>
<blockquote
cite="mid:CAHxHggc9tUwJbGTaAVxQR3JWOW25Yw76kn4iewNw9=mK=SX2hQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>I do wonder whether it is worth considering an attempt to
create a new set of properties of UNICODED characters that are
of specific use to the DNS. The IDNA 2008 work tried to use
properties of characters developed for purposes other than the
DNS and the fit is not always perfect. <br>
</div>
</div>
</blockquote>
<br>
In principle the answer to that is yes. <br>
<br>
Unicode has discovered that the cleanest way to do many properties
is to derive any new property from a combination of other properties
where possible, and where not, to create exception lists. (Where the
underlying properties are not immutable, the derivation gets checked
each version, and exception lists can be re-generated to keep the
derived property immutable. That's still less work, than maintaining
an entirely separate property).<br>
<br>
That's more or less the path that's been followed for the IDNA2008
specific properties.<br>
<br>
In that sense, your argument comes down to improving the IDNA208
specific properties.<br>
<br>
I see one practical limitation in the fact that what is good for a
stable and robust system of universal identifies will be at odds
with the desire to provide mnemonics that work according to the
expectations of specific sets of users (those expectations being
based on the writing system, and the use thereof, that they are
familiar with).<br>
<br>
As long as you cater to that on the protocol level, you run into the
same kinds of "universality constraints" that Unicode runs into:
some stuff needed for local support doesn't play well globally (and
vice versa).<br>
<br>
Having just gone through that exercise, we've concluded that only
about a third of all code points that are PVALID should even be
considered for the Root Zone. The actual number that will come out
of the more detailed investigations to follow will be smaller.<br>
<br>
In some cases, the restrictions imposed by that limitation will lead
to exclusions that will look mighty arbitrary if seen through the
lens of a local writing system. While it's not possible to render an
English possessive in the DNS ("Barron's"), in some language we are
proposing to not support the representation of plurals in the root.
That's appropriate for the root, but I wonder very much whether it's
appropriate to do something that drastic on the protocol level.<br>
<br>
And, as long as it isn't, it would represent a constraint on the
kinds of properties you can design on the protocol level.<br>
<br>
In the case where two writing systems have conflicting demands, but
where you don't want to pick one over the other, you need a
different mechanism that essentially says: in each zone, you can
have either one of these, but not both. And you want that mechanism
as close to the protocol level as you can get.<br>
<br>
Having a robust way to define this mutual exclusion in a zone's IDN
table (and perhaps backed up by an IDNA property that flags a code
point or sequence as requiring such an exclusion to be defined)
would seem to be an answer. In the root zone, we will have such a
robust exclusion mechanism by the use of "blocked" variants.<br>
<br>
A./<br>
</body>
</html>