IDNA document set - Last Call review

Mon Aug 24 00:53:31 CEST 2009

Dear Colleagues,

This memo covers the collective remarks from the IUCG/france at large
CX-DNS participants. The participants are of three origins:

    * Two newcomers who are motivated by the Last Call opportunity to
ensure that they understand everything.
    * Three participants who have been involved, at some stage and
form, in the IDNABIS debate, who in turn would like to see the
document published finally.
    * Five participants who are primarily interested in making sure
that they can document "IDNAPLUS" as the strictly conformant support
of IDNA by the Interplus architecture that the IUCG is working on.

These remarks can be found at http://wikidna.org/index.php?title=IUCG_remarks
Best regards

Elisabeth Blanconil

----------------------------------------------------------------------

General appreciation

    * The document repartition seems adequate. However, even if the
Mapping memo was not a part of the IDNA (why?) document set, it is
more than logical and enlightening to have it read prior to the
Protocol parts.
    * The documents are rather confusing because it is impossible to
decide whether:
          o they consider IDNA as a part or not as a part of the DNS
(we may also be influenced by the ML-DNS pile we work on).
          o they differentiate (which) between characters and codepoints.
          o they use NFKC or NFC, and what are their differences,
intrinsically and from an IDNA point of view
          o they want to be a complete standards, or a partial
suggestions, set. This results from:
                + the non-normative forms are being used in places
that one would deem normative
                + the constant discussion of Registries'
capacities/obligations and the lack of documentation on the tools for
executing them and managing the related registration/coding metadata
and rules.

IDNA Definitions

    * Information on Unicode is scattered throughout the document.
Wouldn’t it be much better to describe a clear sequence?

        * what an IDNA is,
        * what IDNs are,
        * what IDNA labels are,
        * what they are made of,
        * how Unicode supports them, including NFC in the same 2.1. section,
        * how a zone manager may impose profiling rules (description,
enforcement).

    * Most of the new terms are discussed before being defined. This
starts with the confusing "looking them up" in part 1.1.1. (which
means resolving, and not just asking about, validity or existence) as
opposed to "registering"). IDNs are introduced 2.3.2.3. etc. This
certainly reflects how difficult the work is in defining all these
terms, but it is still quite confusing. For example, it is advisable
to begin with part 4.4.

    * The different classes of domain names that are discussed only
seem to be related to IDNA without an exhaustive presentation of the
DNS domain name context. The names are somewhat confusing. The drafts
are certainly clear, but they do not reflect a progressive logic of
discovery of the nature of a name/label that could be ported to
programming functions.

    * References to the lower/uppercase image can be understood by DNS
old-timers, but is confusing to newcomers, as it does not reflect the
same functionality and because U-label/A-label lower/uppercase
treatment is not the same.

    * Different keyboards and encoding are discussed, stressing that a
DNS resolution calls for a U-label conversion, but nothing obliges
local applications to transcode user entries to Unicode when they
interoperate at a layer other than DNS. However, these applications
may want to canonize these entries in their proper way. Interplus
supports the idea that an application layer may use middle non-Unicode
and non-ASCII coding. Among others, this facilitates interoperability
with UTF-8 that Microsoft supports within private nets: the user
interface may be common and the underlying machinery either IDNA or
UTF-8.

    * 4.1. "Security on the Internet partly relies on the DNS. Thus,
any change to the characteristics of the DNS can change the security
of much of the Internet." This sentence seems extremely confusing, as
IDNA does not affect (change characteristics) the DNS but is rather
built on the fact that they will not be changed.

    * The same : "The security of the Internet is compromised if a
user entering a single internationalized name is connected to
different servers based on different interpretations of the
internationalized domain name." The security of the Internet is not
compromised, however, trust in the IDNA proposition might be.

    * The 4.7. Summary might be considered adventurous? Corporations
such as Nominum propose services that are supposed to protect the DNS.
One of the purposes of ML-DNS is precisely to permit an architectural
protection.

IDNA Rationale

    * 1.3.1. DNS "Name" Terminology. "" would be better read as
"orthotypographic" as an orthographic error that can be a way to lose
some special semantics differences due to orthotypographic
conventions.

    * 1.3.2. "IDNA-landr" typo?

    * 1.4. "Reduce the dependency on mapping, in order that the
pre-mapped forms (which are not valid IDNA labels) tend to appear less
often in various contexts, in favor of valid A-labels." calls for the
Charter to be revised. ALternatively, it could say , remove dependence
on mapping as per a mapping document, in which this document would
include a section on the various ways to ensure DNS security and the
barring of some U+codes in some presentations.

    * 1.5. "This model has served the existing applications well, but
it requires, with or without internationalized domain names, that
users know the exact spelling of the domain names that are to be typed
into applications such as web browsers and mail user agents. The
introduction of the larger repertoire of characters potentially makes
the set of misspellings larger, especially given that in some cases
the same appearance, for example on a business card, might visually
match several Unicode code points or several sequences of code
points." may be read as if the users of these languages were more
prone to errors than ASCII language.

    * "If an application wants to use non-ASCII characters in public
DNS domain names, IDNA is the only currently-defined option." IDNA is
not a DNS option. It is an application way to transcode Unicode domain
names in LDH domain names for the convenience of ASCII oriented
international managers. The idea is to attain the adherence of local
users and managers to IDNA and not to impose ASCII on them. DNS is
UTF-8 compatible.

    * "IDNA2008 divides all possible Unicode code-points into four
categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED and
UNASSIGNED.
      3.1.1. PROTOCOL-VALID
      Characters identified as "PROTOCOL-VALID" (often abbreviated
"PVALID") are permitted in IDNs." Are we talking of code-points or of
characters?

    * 3.1.2.1 Not in the TOC

    * 3.1.3 Disallowed "various HEART symbols" - is U+38FA also
disallowed? or U+3966?

    * 3.1.3. This is the first time anyone has spoken of NFKC. In IDNA
Defs and other cases, it is NFC. Shouldn’t t both of them be
documented? Shouldn’t someone explain in which specific case one is
used?

    * "The character is an upper-case form or some other form that is
mapped to another character by Unicode casefolding." this seems to
create a very large mapping scheme that depends on a non-documented
Unicode system needing correction (at least when it does not
specifically support majuscules). Moreover, are we dealing with
characters (that are orthogonal to Unicode) or with codepoints that
represent characters and that are subject to Unicode casefolding. The
proposition is to: (1) clarify the character/codepoint issue, (2)
explain what Unicode case folding is and its limitations, (3) move
them to CONTEXTO when these codepoints are both used as upper-cases
and as majuscules, (4) explain that majuscules that are supported by
upper-cases will be transcoded by punycode.

    * 4.4. Case mapping. One may regret that the French majuscules
current support of Unicode, which isperfectly adequate in other
circumstances yet inadequate in this case, is not discussed. This
would explain the upgrade above.

    * 4.5. "Examples of this are Yiddish, written with an extended
Hebrew script, and Dhivehi (the official language of Maldives) which
is written in the Thaana script (which is, in turn, derived from the
Arabic script)" It seems that some explanation about Yiddish would be
welcome so that the language will obtain the same support as Dhivehi
and Thaana.

    * 5. "Conversely, lookup applications are expected to reject
labels that clearly violate global (protocol) rules (no one has ever
seriously claimed that being liberal in what is accepted requires
being stupid)." The remark between the parentheses is confusing: it
possibly qualifies as "stupid" a behavior that is not recommended, but
that is acceptable by the document set.

    * "Application implementors should be aware that where DNS
wildcards are used, the ability to successfully resolve a name does
not guarantee that it was actually registered." In which terms is this
specific to IDNA?

    * 7.6. The Symbol Question. That part actually discusses the
Unicode originated difficulties. Yet, the choice of Unicode has not
yet been discussed.

    * 9. "Adding languages (or similar context) to IDNs generally, or
to DNS matching in particular, would imply context dependent matching
in DNS, which would be a very significant change to the DNS protocol
itself". This sentence seems confusing. Natural languages are quoted
throughout this IDNs document.

IDNA Mapping

    * Not sure that the terminology of "make sense" is adequate or clear.

    * 1. Introduction - This document is supposed to be separated from
the IDNA document set. It should then document what the IDNA protocol
is. It seems that the IDNA2008 protocols boil down to "DNS domain
names are to be expressed in LDH form. IDNA is a commonly agreed upon
convention wherein if they are entered by the user in another form,
applications are advised to convert them to UTF in order to filter and
map them, as is discussed in the present document, as well as to
transcode them in by using the punycode algorithm. Depending on the
Registry policy, their registration can be carried out in the ITF
and/or the transcoded ASCII form."

    * 2.3. NFC is confirmed, NFKC is not discussed.

IDNA Protocol

    * As a general comment:
          o The SHOULD/MUST chains may be somewhat awkward. MUSTs are
used in a protocol procedure and then an alternative to that procedure
is pragmatically considered. It could be of interest to draft a MUST
tree to consider which cases are, or are not, covered.
          o there is some confusion as to what the "string" is
compared to the label and domain name, in which "Label" may be used
instead of "U-Label" or sometimes "A-Label". Wouldn’t it be better to
review the text in qualifying the "labels" in order to be certain that
all the cases are clearly covered?

    * 3.2. "It does not apply to domain name slots which do not use
the Letter/Digit/Hyphen (LDH) syntax rules." Confusing. Would some of
the DN slots not accept both?

    * 3.2.1. The word CLASS only appears in the whole document set in
two sentences: "DNA applies only to domain names in the NAME and RDATA
fields of DNS resource records whose CLASS is IN. See RFC 1034
[RFC1034] for precise definitions of these terms. The application of
IDNA to DNS resource records depends entirely on the CLASS of the
record, and not on the TYPE except as noted below."
      What about internationalized domain name in a non IN CLASS?

    * 4. "This section defines the procedure for registering an IDN.
The procedure is implementation independent; any sequence of steps
that produces exactly the same result for all labels is considered a
valid implementation." A procedure does provide but does not define a
result?

    * 4.1. The obligation chain reads: "By the time a string enters
the IDNA registration process [], it is expected to be in Unicode []",
yet "registries [] SHOULD avoid any possible ambiguity by accepting
registrations only for A-labels []."

    * 4.3. Registry restriction inheritance is not alluded to.

    * 5. Does this repetition (already in INDA Rationale) "the
presence of wild cards in the DNS might cause a string that is not
actually registered in the DNS to be successfully looked up." reflect
what the BIDI documents slightly differently: "Wildcards create the
odd situation where a label is "valid" (can be looked up successfully)
without the zone owner knowing that this label exists. So an owner of
a zone whose name starts with a digit and contains a wildcard has no
way of controlling whether or not names with RTL labels in them are
looked up in his zone."

    * 5.2. The case of a character that is not supported by Unicode is
not discussed.

    * 5.4. The use of "U-Labels" in this part instead of "Labels"
would probably clarify it.

    * "applying the test is likely to give much better information
about the reason for a lookup failure -- information that may be
usefully passed to the user when that is feasible -- than DNS
resolution failure information alone" might this lead to the idea that
they could also be carried in case of the failure to better document
it?

    * "For all other strings, the lookup application MUST rely on the
presence or absence of labels in the DNS to determine the validity of
those labels and the validity of the characters they contain". Is it
correct to assume that the first labels stand for "A-Label" and the
second one stands for "their corresponding U-Labels"?

    * 7. IANA Considerations - There is no commitment from UNICODE to
not update those Unicode documents that are accepted as normative in
the IDNA documentation set. Should their copy at the time of the
publication of this set not be stored by the IANA?

IDNA BIDI

    * 1.1. Advisable or not to specify "when U-labels" instead of "labels" ?

    * 1.4. BIDI properties come from Unicode. They might not be
complete or could be completed in the future. What then?

    * 2. A replacement for the RFC 3454 BIDI rule: it would probably
be good to indicate the applying order.

    * 7. Does that restriction mean that telephone numbers cannot be
registered in BIDI zones?

    * 8. IANA considerations. Same remark as in the Protocol case.
Moreover, the section above then states: "the determination of
validity for any string depends on the Unicode BIDI property values,
which are not declared immutable by the Unicode Consortium."

IDNA Tables

    * 1. Introduction. "In particular, some combinations of allowed
code points are not advisable for use in IDNs due to rules specific to
a script or class of characters" introduces the concept of a "class of
characters", but does not document it. IDNA Rationale 7.1.3 states
"Maintain IDNA and Unicode tables that are consistent with regard to
versions, i.e., unless the application actually executes the
classification rules in [IDNA2008-Tables]" yet the only time
"classifications (rules) appears" in IDNA Tables is in "4. Code
points" as "The Categories and Rules defined in Section 2 and Section
3 apply to all Unicode code points. The table in Appendix B shows, for
illustrative purposes, the consequences of the categories and
classification rules, and the resulting property values."
      What is a "class of characters"?

    * 1. Introduction ends with " This document is part of a series
that, together, constitute a proposal for updating the IDNA standards
to resolve issues uncovered in recent years, cover a broader range of
scripts, and provide for migration to newer versions of Unicode. See
[IDNA2008-rationale] for a broader discussion. " Should this not be
removed or edited?

    * 2.1. "For more information, see section 4.5 of The Unicode
Standard [Unicode5]." Is it also the case in Unicode 5.1? Shouldn’t
this document be stored by the IANA?

    * 2.2. NFKC or NFC?

    * 2.10. "It should be noted that Unicode distinguishes between
'unassigned code points' and 'unassigned characters'". Can the
differences (nature and in relation to IDNA) between the characters
and codepoints be explained here?

    * 5. IANA consideration. It is suggested that IANA should retain
online copies of the version of external documents that are
normatively referenced in the IETF documents.

    * "A table from which that registry can be initialized, and some
further discussion, appears in Appendix A. " - Who is to decide and
maintain the table and according to which rules/procedures?

    * Appendix A. as a comment, we do not understand, from the
presented kind of logic, as to why:
          o Tamil digits cannot be made subject to a rule and added to CONTEXTO?
          o The same for French majuscules?
          o The same for any zone specific restriction?

    It seems implied that the logic should be the same on the sending
and receiving end. The receiving end is only for decoding what the
sending end chose to encode in its own context. That context needs to
be considered and supported. If my application is in Tamil or French,
it knows it and can be demanded to proceed accordingly.