comments on the document set
Peter Saint-Andre
stpeter at stpeter.im
Mon Oct 19 19:26:14 CEST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
First, thanks to everyone who has worked on the IDNAbis document set.
Clearly a great deal of thought and effort has gone into this work.
Herewith some (late) comments, most of a minor nature. I have tried not
to comment on issues that have already been raised by others but I am
sure there is some overlap. Note that I have reviewed these documents
with particular focus on the use of IDNs in a particular application
protocol: XMPP. Because XMPP as defined in RFC 3920 uses IDNA2003, the
XMPP community has a special interest in the progression of IDN
technologies (and also of internationalized "names" in general, given
that we have also defined two other Stringprep profiles for use in the
construction and comparision of XMPP addresses).
RATIONALE
1.3.2.
The term "IDNA-landr" is used here but undefined.
1.5.
This text is awkward:
a single exactly-matching (subject to the
base DNS requirement of case-insensitive ASCII matching) name
I suggest:
a single exactly-matching name (subject to the
base DNS requirement of case-insensitive ASCII matching)
Typo: "user's computers" should be "users' computers"
3.1.2.1.
Missing word: "What they are expected do is to confine" should probably
be "What they are expected to do is confine"
3.1.4.
This run-on sentence is hard to read:
If, for example, such a code point was permitted to be included in a
label to be looked up, and the code point was later to be assigned to
a character that required some set of contextual rules, un-updated
instances of IDNA-aware software might permit lookup of labels
containing the previously-unassigned characters while updated
versions of IDNA-aware software might restrict their use in lookup,
depending on the contextual rules.
I do not yet have alternatve text to suggest.
3.2.
Typo: "requiring that registrant need to provide characters" should be
"requiring that registrants need to provide characters" (plural
"registrants" instead of singular "registrant").
4.2.
The concept of a "domain name slot" is helpful, and it might be good to
suggest here that all using protocols explicitly define what their
domain name slots are.
4.4.
The use of the phrase "in some sense" here seems odd:
Because IDNA2003 maps Final Sigma and Eszett to other characters, and
the reverse mapping is never possible, that in some sense means that
neither Final Sigma nor Eszett can be represented in a IDNA2003 IDN.
In what sense? Can they or can't they be represented?
6.
This text is a bit of a teaser:
While there are strong arguments for any
domain name that is placed "on the wire" -- transmitted between
systems -- to be in the zero-ambiguity forms of A-labels, it is
inevitable that programs that process domain names will encounter
U-labels or variant forms.
At the least, it would be helpful to either spell out those "strong
arguments" or provide a pointer to a document that makes those
arguments. And does this apply only to DNS applications (e.g.,
registration and lookup in the DNS itself) or also to applications that
use IDNA (e.g., email, XMPP, IRIs)?
7.6.
Missing word: "there more than one" should be "there is more than one"
Extraneous words: "U+2729 or the without" should be "U+2729 without"
7.7.
Too many changes:
Such changes may change the
preferred form for writing a particular string, changes that may be
reflected, e.g., in keyboard transition modules that would
necessarily be different from those for earlier versions of Unicode
where the newer characters may not exist.
I suggest "Such additions may change..."
8.1.
Typo: "can not" should be "cannot"
14.1.
Is RFC 3490 truly a normative reference?
DEFS
2.3.1.
It's not clear here whose responsibility it is to determine that a "Fake
A-Label" is truly fake.
2.3.2.1.
Missing word: "a string meeting can be decoded"
This is a bit unclear:
U-labels can appear, along with the other two, in
presentation and user interface forms and in selected protocols other
than those of the DNS itself.
It might be clearer if this text specified what those selected protocols
are, or how such protocols might be selected.
4.4.
Other documents in the set use the term "confusables" to refer to
visually similar characters, seemingly derived from RFC 4690. It might
be helpful to mention that term here.
PROTOCOL
2.
Missing text:
It is worth noting that some of this terminology
overlaps with, and is consistent with, that used , but also in
Unicode or other character set standards and the DNS.
Perhaps "that used in [someref]" is meant but the reference is missing.
3.2.
Advice to using protocols:
IDNs occupying domain name slots
in those older protocols MUST be in A-label form until and unless
those protocols and their implementations are explicitly upgraded to
be aware of IDNs.
This does not specify what form an IDN is to take in a using protocol
which has been explicitly upgraded to be aware of IDNs. Must they be
provided in U-label form? Are both forms permitted? Is this left up to
the using protocol?
4.
Missing word: "they not identical" should be "they are not identical"
Appendix A.
4. Remove the mapping and normalization steps from the protocol and
have them instead done by the applications themselves, possibly
in a local fashion, before invoking the protocol.
It is unclear here, and throughout the document set, what precisely is
meant by "application". Does this mean a DNS application such as a
registration interface or a resolver, a using protocol (i.e., an
application protocol that makes use of IDNs, such as EAI or XMPP), or
both? IMHO this could be clarified in, for example, section 1.1. of the
Rationale document (which uses the term "client applications"), section
1.1.1 and section 2 of the Definitions document, section 3.2 of the
Protocol document (which speaks of "protocols" instead of
"applications"), and section 2 of the Mapping document.
TABLES
1.
Extraneous word: "the the" should be "the"
2.4.
Typo: "to identifying" should be "to identify"
5.1.
The IANA is to calculate the derived property value for each codepoint:
IANA is to keep a list of the derived property for the versions of
Unicode that is released after (and including) version 5.1. The
derived property value is to be calculated according to the
specifications in sections Section 2 and Section 3 and not by copying
the non-normative table found in Appendix B. Changes to the rules,
including BackwardCompatible (Section 2.7) (a set that is at release
of this document is empty), require IETF Review, as described in
[RFC5226]
This might be taken to imply that changes to the results of applying the
rules (as opposed to changes in the rules themselves) require neither
review nor notification. If the IANA changes it interpretation of a
given rule or fixes a bug in its calculation methods, will any
notification take place? Given the sensitive nature of the derived
properties for a given codepoint (and perhaps the necessity to modify or
disable existing registrations), some notification mechanism might be
helpful.
MAPPING
[no comments]
BIDI
[no comments]
Peter
- --
Peter Saint-Andre
https://stpeter.im/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkrcoTUACgkQNL8k5A2w/vxF9wCfSydr6TRa3FTg3dNBtOXt2VWc
qXMAoLGBKZ0KUZWwymUofBplMvpCue9a
=mDCj
-----END PGP SIGNATURE-----
More information about the Idna-update
mailing list