comments on the document set

Mon Oct 19 19:26:14 CEST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

First, thanks to everyone who has worked on the IDNAbis document set.
Clearly a great deal of thought and effort has gone into this work.

Herewith some (late) comments, most of a minor nature. I have tried not
to comment on issues that have already been raised by others but I am
sure there is some overlap. Note that I have reviewed these documents
with particular focus on the use of IDNs in a particular application
protocol: XMPP. Because XMPP as defined in RFC 3920 uses IDNA2003, the
XMPP community has a special interest in the progression of IDN
technologies (and also of internationalized "names" in general, given
that we have also defined two other Stringprep profiles for use in the
construction and comparision of XMPP addresses).

RATIONALE

1.3.2.

The term "IDNA-landr" is used here but undefined.

1.5.

This text is awkward:

   a single exactly-matching (subject to the
   base DNS requirement of case-insensitive ASCII matching) name

I suggest:

   a single exactly-matching name (subject to the
   base DNS requirement of case-insensitive ASCII matching)

Typo: "user's computers" should be "users' computers"

3.1.2.1.

Missing word: "What they are expected do is to confine" should probably
be "What they are expected to do is confine"

3.1.4.

This run-on sentence is hard to read:

   If, for example, such a code point was permitted to be included in a
   label to be looked up, and the code point was later to be assigned to
   a character that required some set of contextual rules, un-updated
   instances of IDNA-aware software might permit lookup of labels
   containing the previously-unassigned characters while updated
   versions of IDNA-aware software might restrict their use in lookup,
   depending on the contextual rules.

I do not yet have alternatve text to suggest.

3.2.

Typo: "requiring that registrant need to provide characters" should be
"requiring that registrants need to provide characters" (plural
"registrants" instead of singular "registrant").

4.2.

The concept of a "domain name slot" is helpful, and it might be good to
suggest here that all using protocols explicitly define what their
domain name slots are.

4.4.

The use of the phrase "in some sense" here seems odd:

   Because IDNA2003 maps Final Sigma and Eszett to other characters, and
   the reverse mapping is never possible, that in some sense means that
   neither Final Sigma nor Eszett can be represented in a IDNA2003 IDN.

In what sense? Can they or can't they be represented?

6.

This text is a bit of a teaser:

   While there are strong arguments for any
   domain name that is placed "on the wire" -- transmitted between
   systems -- to be in the zero-ambiguity forms of A-labels, it is
   inevitable that programs that process domain names will encounter
   U-labels or variant forms.

At the least, it would be helpful to either spell out those "strong
arguments" or provide a pointer to a document that makes those
arguments. And does this apply only to DNS applications (e.g.,
registration and lookup in the DNS itself) or also to applications that
use IDNA (e.g., email, XMPP, IRIs)?

7.6.

Missing word: "there more than one" should be "there is more than one"

Extraneous words: "U+2729 or the without" should be "U+2729 without"

7.7.

Too many changes:

   Such changes may change the
   preferred form for writing a particular string, changes that may be
   reflected, e.g., in keyboard transition modules that would
   necessarily be different from those for earlier versions of Unicode
   where the newer characters may not exist.

I suggest "Such additions may change..."

8.1.

Typo: "can not" should be "cannot"

14.1.

Is RFC 3490 truly a normative reference?

DEFS

2.3.1.

It's not clear here whose responsibility it is to determine that a "Fake
A-Label" is truly fake.

2.3.2.1.

Missing word: "a string meeting can be decoded"

This is a bit unclear:

   U-labels can appear, along with the other two, in
   presentation and user interface forms and in selected protocols other
   than those of the DNS itself.

It might be clearer if this text specified what those selected protocols
are, or how such protocols might be selected.

4.4.

Other documents in the set use the term "confusables" to refer to
visually similar characters, seemingly derived from RFC 4690. It might
be helpful to mention that term here.

PROTOCOL

2.

Missing text:

   It is worth noting that some of this terminology
   overlaps with, and is consistent with, that used , but also in
   Unicode or other character set standards and the DNS.

Perhaps "that used in [someref]" is meant but the reference is missing.

3.2.

Advice to using protocols:

   IDNs occupying domain name slots
   in those older protocols MUST be in A-label form until and unless
   those protocols and their implementations are explicitly upgraded to
   be aware of IDNs.

This does not specify what form an IDN is to take in a using protocol
which has been explicitly upgraded to be aware of IDNs. Must they be
provided in U-label form? Are both forms permitted? Is this left up to
the using protocol?

4.

Missing word: "they not identical" should be "they are not identical"

Appendix A.

   4.   Remove the mapping and normalization steps from the protocol and
        have them instead done by the applications themselves, possibly
        in a local fashion, before invoking the protocol.

It is unclear here, and throughout the document set, what precisely is
meant by "application". Does this mean a DNS application such as a
registration interface or a resolver, a using protocol (i.e., an
application protocol that makes use of IDNs, such as EAI or XMPP), or
both? IMHO this could be clarified in, for example, section 1.1. of the
Rationale document (which uses the term "client applications"), section
1.1.1 and section 2 of the Definitions document, section 3.2 of the
Protocol document (which speaks of "protocols" instead of
"applications"), and section 2 of the Mapping document.

TABLES

1.

Extraneous word: "the the" should be "the"

2.4.

Typo: "to identifying" should be "to identify"

5.1.

The IANA is to calculate the derived property value for each codepoint:

   IANA is to keep a list of the derived property for the versions of
   Unicode that is released after (and including) version 5.1.  The
   derived property value is to be calculated according to the
   specifications in sections Section 2 and Section 3 and not by copying
   the non-normative table found in Appendix B.  Changes to the rules,
   including BackwardCompatible (Section 2.7) (a set that is at release
   of this document is empty), require IETF Review, as described in
   [RFC5226]

This might be taken to imply that changes to the results of applying the
rules (as opposed to changes in the rules themselves) require neither
review nor notification. If the IANA changes it interpretation of a
given rule or fixes a bug in its calculation methods, will any
notification take place? Given the sensitive nature of the derived
properties for a given codepoint (and perhaps the necessity to modify or
disable existing registrations), some notification mechanism might be
helpful.

MAPPING

[no comments]

BIDI

[no comments]

Peter

- --
Peter Saint-Andre
https://stpeter.im/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrcoTUACgkQNL8k5A2w/vxF9wCfSydr6TRa3FTg3dNBtOXt2VWc
qXMAoLGBKZ0KUZWwymUofBplMvpCue9a
=mDCj
-----END PGP SIGNATURE-----