Tuning of Defs, Protocol, and Rationale
John C Klensin
klensin at jck.com
Fri Nov 14 18:51:47 CET 2008
Hi.
During the two weeks since the current versions of
"Definitions", "Protocol", and "Rationale" were posted, I've
received and integrated a number of suggestions about changes
that seemed either editorial or obvious.
In order that everyone be looking at the same documents at the
meeting, I don't intend to try to post new versions early
Monday; these changes will show up after the WG meetings with
whatever else is incorporated as a result of meeting discussions.
However, if anyone wants to check on whether their particular
suggestions have been incorporated, or wants an advance look at
whether I've messed something up, diffs between the posted
documents and the working versions are attached.
Again, I do not consider any of these changes to be terribly
significant, even though a few of them resolve problems that
have been pointed out on-list. The Change Log entries for the
three are as follows (obviously, these are subject to revision
too, possibly even as I catch more problems over the weekend).
I've removed the TOC listings from the diff files -- they didn't
contribute much and take up a lot of space.
john
-------------
Definitions Version -02
o All back pointers to section numbers in Rationale have
been removed.
o Some definitions clarified. Added one about string
order.
Protocol Version -07
o Multiple small textual and editorial changes and
clarifications.
o Requirement for normalization clarified to apply to all
cases and conditions for preprocessing further clarified.
Rationale Version -05
o Many small editorial changes, including changes to
eliminate the last vestiges of what appeared to be 2119
language (upper-case MUST, SHOULD, or MAY).
-------------- next part --------------
** Rationale -05 to -05 ***
Internationalized Domain Names for Applications (IDNA): Background,
Explanation, and Rationale
Section 1.6., paragraph 5:
OLD:
Operations for converting between local character sets and normalized
Unicode are part of this general set of user interface issues. The
conversion is obviously not required at all in a Unicode-native
system that maintains all strings in Normalization Form C (NFC). It
may, however, involve some complexity in a system that is not
Unicode-native, especially if the elements of the local character set
do not map exactly and unambiguously into Unicode characters or do so
in a way that is not completely stable over time. Perhaps more
important, if a label being converted to a local character set
contains Unicode characters that have no correspondence in that
character set, the application may have to apply special, locally-
appropriate, methods to avoid or reduce loss of information.
NEW:
Operations for converting between local character sets and normalized
Unicode are part of this general set of user interface issues. The
conversion is obviously not required at all in a Unicode-native
system that maintains all strings in Normalization Form C (NFC).
(See [Unicode-UAX15] for precise definitions of NFC and NFKC if
needed.) It may, however, involve some complexity in a system that
is not Unicode-native, especially if the elements of the local
character set do not map exactly and unambiguously into Unicode
characters or do so in a way that is not completely stable over time.
Perhaps more important, if a label being converted to a local
character set contains Unicode characters that have no correspondence
in that character set, the application may have to apply special,
locally-appropriate, methods to avoid or reduce loss of information.
Section 3.1.3., paragraph 1:
OLD:
For convenience in processing and table-building, code points that do
not have assigned values in a given version of Unicode are treated as
belonging to a special UNASSIGNED category. Such code points MUST
NOT appear in labels to be registered or looked up. The category
differs from DISALLOWED in that code points are moved out of it by
the simple expedient of being assigned in a later version of Unicode
(at which point, they are classified into one of the other categories
as appropriate).
NEW:
For convenience in processing and table-building, code points that do
not have assigned values in a given version of Unicode are treated as
belonging to a special UNASSIGNED category. Such code points are
prohibited in labels to be registered or looked up. The category
differs from DISALLOWED in that code points are moved out of it by
the simple expedient of being assigned in a later version of Unicode
(at which point, they are classified into one of the other categories
as appropriate).
Section 3.2., paragraph 1:
OLD:
While these recommendations cannot and should not define registry
policies, registries SHOULD develop and apply additional restrictions
to reduce confusion and other problems. For example, it is generally
believed that labels containing characters from more than one script
are a bad practice although there may be some important exceptions to
that principle. Some registries may choose to restrict registrations
to characters drawn from a very small number of scripts. For many
scripts, the use of variant techniques such as those as described in
RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and illustrated for
Chinese by the tables described in RFC 4713 [RFC4713] may be helpful
in reducing problems that might be perceived by users.
NEW:
While these recommendations cannot and should not define registry
policies, registries should develop and apply additional restrictions
to reduce confusion and other problems. For example, it is generally
believed that labels containing characters from more than one script
are a bad practice although there may be some important exceptions to
that principle. Some registries may choose to restrict registrations
to characters drawn from a very small number of scripts. For many
scripts, the use of variant techniques such as those as described in
RFC 3843 [RFC3743] and RFC 4290 [RFC4290], and illustrated for
Chinese by the tables described in RFC 4713 [RFC4713] may be helpful
in reducing problems that might be perceived by users.
Section 4.2., paragraph 2:
OLD:
An IDNA-aware application can accept and display internationalized
domain names in two formats: the internationalized character set(s)
supported by the application (i.e., an appropriate local
representation of a U-label), and as an A-label. Applications MAY
allow the display of A-labels, but are encouraged to not do so except
as an interface for special purposes, possibly for debugging, or to
cope with display limitations. In general, they SHOULD allow, but
not encourage, user input of that label form. A-labels are opaque
and ugly and malicious variations on them are not easily detected by
users. Where possible, they should thus only be exposed to users and
in contexts in which they are absolutely needed. Because IDN labels
can be rendered either as A-labels or U-labels, the application may
reasonably have an option for the user to select the preferred method
of display; if it does, rendering the U-label should normally be the
default.
NEW:
An IDNA-aware application can accept and display internationalized
domain names in two formats: the internationalized character set(s)
supported by the application (i.e., an appropriate local
representation of a U-label), and as an A-label. Applications may
allow the display of A-labels, but are encouraged to not do so except
as an interface for special purposes, possibly for debugging, or to
cope with display limitations. In general, they should allow, but
not encourage, user input of that label form. A-labels are opaque
and ugly and malicious variations on them are not easily detected by
users. Where possible, they should thus only be exposed to users and
in contexts in which they are absolutely needed. Because IDN labels
can be rendered either as A-labels or U-labels, the application may
reasonably have an option for the user to select the preferred method
of display; if it does, rendering the U-label should normally be the
default.
Section 4.2., paragraph 4:
OLD:
In protocols and document formats that define how to handle
specification or negotiation of charsets, labels can be encoded in
any charset allowed by the protocol or document format. If a
protocol or document format only allows one charset, the labels MUST
be given in that charset. Of course, not all charsets can properly
represent all labels. If a U-label cannot be displayed in its
entirety, the only choice (without loss of information) may be to
display the A-label.
NEW:
In protocols and document formats that define how to handle
specification or negotiation of charsets, labels can be encoded in
any charset allowed by the protocol or document format. If a
protocol or document format only allows one charset, the labels must
be given in that charset. Of course, not all charsets can properly
represent all labels. If a U-label cannot be displayed in its
entirety, the only choice (without loss of information) may be to
display the A-label.
Section 4.2., paragraph 5:
OLD:
In any place where a protocol or document format allows transmission
of the characters in internationalized labels, labels SHOULD be
transmitted using whatever character encoding and escape mechanism
the protocol or document format uses at that place. This provision
is intended to prevent situations in which, e.g., UTF-8 domain names
appear embedded in text that is otherwise in some other character
coding.
NEW:
In any place where a protocol or document format allows transmission
of the characters in internationalized labels, labels should be
transmitted using whatever character encoding and escape mechanism
the protocol or document format uses at that place. This provision
is intended to prevent situations in which, e.g., UTF-8 domain names
appear embedded in text that is otherwise in some other character
coding.
Section 6., paragraph 4:
OLD:
As discussed elsewhere in this document, the IDNA2008 model removes
all of these mappings and interpretations, including the equivalence
of different forms of dots, from the protocol, discouraging such
mappings and leaving them, when necessary, to local processing. This
should not be taken to imply that local processing is optional or can
be avoided entirely, even if doing so might have been desirable in a
world without IDNA2003 IDNs in files and archives. Instead, unless
the program context is such that it is known that any IDNs that
appear will be either U-labels or A-labels, or that other forms can
safely be rejected, some local processing of apparent domain name
strings will be required, both to maintain compatibility with
IDNA2003 and to prevent user astonishment. Such local processing,
while not specified in this document or the associated ones, will
generally take one of two forms:
NEW:
As discussed elsewhere in this document, the IDNA2008 model removes
all of these mappings and interpretations, including the equivalence
of different forms of dots, from the protocol, discouraging such
mappings and leaving them, when necessary, to local processing. This
should not be taken to imply that local processing is optional or can
be avoided entirely, even if doing so might have been desirable in a
world without IDNA2003 IDNs in files and archives. Instead, unless
the program context is such that it is known that any IDNs that
appear will contain either U-label or A-label forms, or that other
forms can safely be rejected, some local processing of apparent
domain name strings will be required, both to maintain compatibility
with IDNA2003 and to prevent user astonishment. Such local
processing, while not specified in this document or the associated
ones, will generally take one of two forms:
Section 7.1.2., paragraph 2:
OLD:
o Any label that appears to be an A-label, i.e., any label that
starts in "xn--", MUST be IDNA-valid, i.e., they MUST be valid
A-labels, as discussed in Section 2 above.
NEW:
o Any label that appears to be an A-label, i.e., any label that
starts in "xn--", must be IDNA-valid, i.e., they must be valid
A-labels, as discussed in Section 2 above.
Section 7.1.2., paragraph 3:
OLD:
o The Unicode tables (i.e., tables of code points, character
classes, and properties) and IDNA tables (i.e., tables of
contextual rules such as those that appear in the Tables
document), MUST be consistent on the systems performing or
validating labels to be registered. Note that this does not
require that tables reflect the latest version of Unicode, only
that all tables used on a given system are consistent with each
other.
NEW:
o The Unicode tables (i.e., tables of code points, character
classes, and properties) and IDNA tables (i.e., tables of
contextual rules such as those that appear in the Tables
document), must be consistent on the systems performing or
validating labels to be registered. Note that this does not
require that tables reflect the latest version of Unicode, only
that all tables used on a given system are consistent with each
other.
Section 7.1.2., paragraph 5:
OLD:
Systems looking up or resolving DNS labels, especially IDN DNS
labels, MUST be able to assume that applicable registration rules
were followed for names entered into the DNS.
NEW:
Systems looking up or resolving DNS labels, especially IDN DNS
labels, must be able to assume that applicable registration rules
were followed for names entered into the DNS.
Section 7.4.3., paragraph 1:
OLD:
While it might be possible to make a prefix change, the costs of such
a change are considerable. Even if they wanted to do so, all
registries could not convert all IDNA2003 ("xn--") registrations to a
new form at the same time and synchronize that change with
applications supporting lookup. Unless all existing registrations
were simply to be declared invalid (and perhaps even then) systems
that needed to support both labels with old prefixes and labels with
new ones would first process a putative label under the IDNA2008
rules and try to look it up and then, if it were not found, would
process the label under IDNA2003 rules and look it up again. That
process could significantly slow down all processing that involved
IDNs in the DNS especially since, in principle, a fully-qualified
name could contain a mixture of labels that were registered with the
old and new prefixes, a situation that would make the use of DNS
caching very difficult. In addition, looking up the same input
string as two separate A-labels would create some potential for
confusion and attacks, since they could, in principle, map to
different targets and then resolve to different DNS label nodes.
NEW:
While it might be possible to make a prefix change, the costs of such
a change are considerable. Even if they wanted to do so, all
registries could not convert all IDNA2003 ("xn--") registrations to a
new form at the same time and synchronize that change with
applications supporting lookup. Unless all existing registrations
were simply to be declared invalid (and perhaps even then) systems
that needed to support both labels with old prefixes and labels with
new ones would first process a putative label under the IDNA2008
rules and try to look it up and then, if it were not found, would
process the label under IDNA2003 rules and look it up again. That
process could significantly slow down all processing that involved
IDNs in the DNS especially since, in principle, a fully-qualified
name could contain a mixture of labels that were registered with the
old and new prefixes, a situation that would make the use of DNS
caching very difficult. In addition, looking up the same input
string as two separate A-labels would create some potential for
confusion and attacks, since they could, in principle, map to
different targets and then resolve to different entries in the DNS.
Section 7.7., paragraph 2:
OLD:
In IDNA2008, strings containing unassigned code points MUST NOT be
either looked up or registered. There are several reasons for this,
with the most important ones being:
NEW:
In IDNA2008, strings containing unassigned code points must not be
either looked up or registered. There are several reasons for this,
with the most important ones being:
Appendix A., paragraph 21:
OLD:
o Material on differences between IDNA2003 and IDNA2003 moved to an
appendix in Protocol.
NEW:
o Material on differences between IDNA2003 and IDNA2008 moved to an
appendix in Protocol.
Appendix A., paragraph 27:
OLD:
Author's Address
NEW:
A.5. Version -05
o Many small editorial changes, including changes to eliminate the
last vestiges of what appeared to be 2119 language (upper-case
MUST, SHOULD, or MAY).
Author's Address
-------------- next part --------------
** Protocol 06 to 07 ****
Internationalized Domain Names in Applications (IDNA): Protocol
Section 1., paragraph 0:
OLD:
1. Whenever a domain name is put into an IDN-unaware domain name
slot (see Section 2 and [IDNA2008-Rationale]), it MUST contain
only ASCII characters (i.e., must be either an A-label or an LDH-
label), or must be a label associated with a DNS application that
is not subject to either IDNA or the historical recommendations
for "hostname"-style names [RFC1034].
NEW:
1. Whenever a domain name is put into an IDN-unaware domain name
slot (see Section 2 and [IDNA2008-Defs]), it MUST contain only
ASCII characters (i.e., must be either an A-label or an LDH-
label), or must be a label associated with a DNS application that
is not subject to either IDNA or the historical recommendations
for "hostname"-style names [RFC1034].
Section 4.2., paragraph 0:
OLD:
The registry MAY permit submission of labels in A-label form. If it
does so, it SHOULD perform a conversion to a U-label, perform the
steps and tests described below, and verify that the A-label produced
by the step in Section 4.5 matches the one provided as input. If,
for some reason, it does not, the registration MUST be rejected. If
the conversion to a U-label is not performed, the registry MUST
verify that the A-label is superficially valid, i.e., that it does
not violate any of the rules of Punycode [RFC3492] encoding such as
the prohibition on trailing hyphen-minus, appearance of non-basic
characters before the delimiter, and so on. Invalid strings that
appear to be A-labels MUST NOT be placed in DNS zones.
[[anchor9: Editorial: Should the sentences starting with "The
registry" be moved to 4.3? I.e., would they be more in sequence
there? Note that A-labels are, by definition, in ASCII, so section
4.2 does not apply to them. The tone of this recommendation also
seems slightly at odds with the statements at the end of 4.2.
Suggested text for cleaning this up, harmonizing it, and reducing
redundancy would be appreciated.]]
4.2. Conversion to Unicode and Normalization
NEW:
4.2. Conversion to Unicode and Normalization
Section 4.2., paragraph 1:
OLD:
Some system routine, or a localized front-end to the IDNA process,
ensures that the proposed label is a Unicode string or converts it to
one as appropriate. That string MUST be in Unicode Normalization
Form C (NFC [Unicode-UAX15]).
NEW:
Some system routine, or a localized front-end to the IDNA process,
ensures that the proposed label is a Unicode string or converts it to
one as appropriate. Independent of its source form, the string MUST
be in Unicode Normalization Form C (NFC [Unicode-UAX15]) before
further processing in this protocol.
Section 4.2., paragraph 2:
OLD:
As a local implementation choice, the implementation MAY choose to
map some forbidden characters to permitted characters (for instance
mapping uppercase characters to lowercase ones), displaying the
result to the user, and allowing processing to continue. However, it
is strongly recommended that, to avoid any possible ambiguity,
entities responsible for zone files ("registries") accept
registrations only for A-labels (to be converted to U-labels by the
registry as discussed above) or U-labels actually produced from
A-labels, not forms expected to be converted by some other process.
NEW:
As a local implementation choice, the implementation MAY choose to
map some forbidden characters to permitted characters (for instance
mapping uppercase characters to lowercase ones), displaying the
result to the user, and allowing processing to continue. This should
be done very conservatively to prevent interoperability problems
lookup applications that do not follow exactly the same rules. In
particular, it is strongly recommended that, to avoid any possible
ambiguity, entities responsible for zone files ("registries") accept
registrations only for A-labels (to be converted to U-labels by the
registry as discussed above) or U-labels actually produced from
A-labels, not forms expected to be converted by some other process.
Section 4.3., paragraph 1:
OLD:
4.3.1. Rejection of Characters that are not Permitted
NEW:
4.3.1. Input Format
[[anchor10: Note in -07 -- this section was formerly the second
paragraph of Section 4.1. It may need additional work; suggestions
welcome.]]
The registry MAY permit submission of labels in A-label form. If it
does so, it SHOULD perform a conversion to a U-label, perform the
steps and tests described below, and verify that the A-label produced
by the step in Section 4.5 matches the one provided as input. If,
for some reason, it does not, the registration MUST be rejected. If
the conversion to a U-label is not performed, the registry MUST
verify that the A-label is superficially valid, i.e., that it does
not violate any of the rules of Punycode [RFC3492] encoding such as
the prohibition on trailing hyphen-minus, appearance of non-basic
characters before the delimiter, and so on. Invalid strings that
appear to be A-labels MUST NOT be placed in DNS zones.
4.3.2. Rejection of Characters that are not Permitted
Section 4.3., paragraph 3:
OLD:
4.3.2. Label Validation
NEW:
4.3.3. Label Validation
Section 4.3., paragraph 5:
OLD:
4.3.2.1. Rejection of Confusing or Hostile Sequences in U-labels
NEW:
4.3.3.1. Rejection of Confusing or Hostile Sequences in U-labels
Section 4.3., paragraph 6:
OLD:
The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
the third and fourth character positions.
NEW:
The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
the third and fourth character positions when the label is considered
in "on the wire" order.
Section 4.3., paragraph 7:
OLD:
4.3.2.2. Leading Combining Marks
NEW:
4.3.3.2. Leading Combining Marks
Section 4.3., paragraph 8:
OLD:
The first character of the string is examined to verify that it is
not a combining mark. If it is a combining mark, the string MUST NOT
be registered.
NEW:
The first character of the string (when the label is considered in
"on the wire" order) is examined to verify that it is not a combining
mark. If it is a combining mark, the string MUST NOT be registered.
Section 4.3., paragraph 9:
OLD:
4.3.2.3. Contextual Rules
NEW:
4.3.3.3. Contextual Rules
Section 4.3., paragraph 10:
OLD:
Each code point is checked for its identification as a character
requiring contextual processing for registration (the list of
characters appears as the combination of CONTEXTJ and CONTEXTO in
[IDNA2008-Tables] as do the contextual rules themselves). If that
indication appears, the table of contextual rules is checked for a
rule for that character. If no rule is found, the proposed label is
rejected and MUST NOT be installed in a zone file. If one is found,
it is applied (typically as a test on the entire label or on adjacent
characters within the label). If the application of the rule does
not conclude that the character is valid in context, the proposed
label MUST BE rejected. (See the IANA Considerations: IDNA Context
Registry section of [IDNA2008-Tables].)
These contextual rules are required to permit the use of characters
that would otherwise risk causing considerable harm. For example,
labels containing invisible ("zero-width") characters may be
permitted in context with characters whose presentation forms are
significantly changed by the presence or absence of the zero-width
characters, while other labels in which zero-width characters appear
may be rejected.
[[anchor14: Should this paragraph be removed? Note that I've been
strongly encouraged to supply specific examples to reduce abstraction
and questions about the appropriateness of the text. -JcK]]
NEW:
Each code point is checked for its identification as a character
requiring contextual processing for registration (the list of
characters appears as the combination of CONTEXTJ and CONTEXTO in
[IDNA2008-Tables] as do the contextual rules themselves). If that
indication appears, the table of contextual rules is checked for a
rule for that character. If no rule is found, the proposed label is
rejected and MUST NOT be installed in a zone file. If one is found,
it is applied (typically as a test on the entire label or on adjacent
characters within the label). If the application of the rule does
not conclude that the character is valid in context, the proposed
label MUST BE rejected. (See the IANA Considerations: IDNA Context
Registry section of [IDNA2008-Tables].)
These contextual rules are required to permit the use of characters
that would otherwise risk causing unacceptable ambiguity in label
matching and interpretation. For example, labels containing
invisible ("zero-width") characters may be permitted in context with
characters whose presentation forms are significantly changed by the
presence or absence of the zero-width characters, while other labels
in which zero-width characters appear may be rejected.
Section 4.3., paragraph 11:
OLD:
4.3.2.4. Labels Containing Characters Written Right to Left
NEW:
4.3.3.4. Labels Containing Characters Written Right to Left
Section 4.3., paragraph 13:
OLD:
4.3.3. Registration Validation Summary
NEW:
4.3.4. Registration Validation Summary
Section 5.5., paragraph 6:
OLD:
o Labels containing other code points that are shown in the
permitted character table as requiring a contextual rule
("CONTEXTO" in the tables), but for which no such rule appears in
the table of rules. With the exception in the rule immediately
above, applications resolving DNS names or carrying out equivalent
operations are not required to test contextual rules, only to
verify that a rule exists.
NEW:
o Labels containing other code points that are shown in the
permitted character table as requiring a contextual rule
("CONTEXTO" in the tables), but for which no such rule appears in
the table of rules. Applications resolving DNS names or carrying
out equivalent operations are not required to test contextual
rules for "CONTEXTO" characters, only to verify that a rule exists
(although they MAY make such tests to give better information to
the user).
Section 5.5., paragraph 8:
OLD:
In addition, the application SHOULD apply the following test. The
test may be omitted in special circumstances, such as when the lookup
application knows that the conditions are enforced elsewhere, because
an attempt to look up and resolve such strings will almost certainly
lead to a DNS lookup failure except when wildcards are present in the
zone. However, applying the test is likely to give much better
information about the reason for a lookup failure -- information that
may be usefully passed to the user when that is feasible -- then DNS
resolution failure information alone. In any event, lookup
applications should avoid attempting to resolve labels that are
invalid under that test.
NEW:
In addition, the application SHOULD apply the following test. The
test may be omitted in special circumstances, such as when the lookup
application knows that the conditions are enforced elsewhere, because
an attempt to look up and resolve such strings will almost certainly
lead to a DNS lookup failure except when wildcards are present in the
zone. However, applying the test is likely to give much better
information about the reason for a lookup failure -- information that
may be usefully passed to the user when that is feasible -- than DNS
resolution failure information alone. In any event, lookup
applications should avoid attempting to resolve labels that are
invalid under that test.
Section 10., paragraph 2:
OLD:
Specific textual changes were incorporated into this document after
suggestions from the other contributors, Stephane Bortzmeyer, Mark
Davis, Paul Hoffman, Kent Karlsson, Erik van der Poel, Marcos Sanz,
Andrew Sullivan, Ken Whistler, and other WG participants. Special
thanks are due to Paul Hoffman for permission to extract material
from his Internet-Draft to form the basis for Appendix A
NEW:
Specific textual changes were incorporated into this document after
suggestions from the other contributors, Stephane Bortzmeyer, Vint
Cerf, Mark Davis, Paul Hoffman, Kent Karlsson, Erik van der Poel,
Marcos Sanz, Andrew Sullivan, Ken Whistler, and other WG
participants. Special thanks are due to Paul Hoffman for permission
to extract material from his Internet-Draft to form the basis for
Appendix A
Appendix B., paragraph 28:
OLD:
Author's Address
NEW:
B.7. Version -07
o Multiple small textual and editorial changes and clarifications.
o Requirement for normalization clarified to apply to all cases and
conditions for preprocessing further clarified.
Author's Address
-------------- next part --------------
*** Defs 01 to 02 ***
Internationalized Domain Names for Applications (IDNA): Definitions and
Document Framework
Section 1.1.1., paragraph 1:
OLD:
While many IETF specifications are directed exclusively to protocol
implementers, the character of IDNA requires that it be understood
and properly used by those whose responsibilities include making
decisions about what names are permitted in DNS zone files and about
policies related to names and naming. This document and those
concerned with the protocol definition, rules for rules for handling
strings that include characters written right-to-left, and the actual
list of characters and categories will be of primary interest to
protocol implementers. This document and the one containing
explanatory material will be of primary interest to others, although
they may have to fill in details of interest by reference to other
documents in the set.
NEW:
While many IETF specifications are directed exclusively to protocol
implementers, the character of IDNA requires that it be understood
and properly used by those whose responsibilities include making
decisions about what names are permitted in DNS zone files and about
policies related to names and naming. This document and those
concerned with the protocol definition, rules for handling strings
that include characters written right-to-left, and the actual list of
characters and categories will be of primary interest to protocol
implementers. This document and the one containing explanatory
material will be of primary interest to others, although they may
have to fill in details of interest by reference to other documents
in the set.
Section 2.1., paragraph 1:
OLD:
[[anchor8: Formerly Section 1.5.2 of Rationale-03]]
A code point is an integer value associated with a character in a
coded character set.
NEW:
A code point is an integer value associated with a character in a
coded character set.
Section 2.1., paragraph 3:
OLD:
ASCII means US-ASCII [ASCII], a coded character set containing 128
characters associated with code points in the range 0000..007F.
Unicode may be thought of as an extension of ASCII; it includes all
the ASCII characters and associates them with equivalent code points.
NEW:
ASCII means US-ASCII [ASCII], a coded character set containing 128
characters associated with code points in the range 0000..007F.
Unicode may be thought of as a generalization of ASCII; it includes
all the ASCII characters and associates them with equivalent code
points.
Section 2.2., paragraph 1:
OLD:
[[anchor10: Formerly Section 1.5.3 of Rationale-03.]]
When discussing the DNS, this document generally assumes the
terminology used in the DNS specifications [RFC1034] [RFC1035]. The
term "lookup" is used to describe the combination of operations
performed by the IDNA2008 protocol and those actually performed by a
DNS resolver. The process of placing an entry into the DNS is
referred to as "registration", similar to common contemporary usage
in other contexts. Consequently, any DNS zone administration is
described as a "registry", regardless of the actual administrative
arrangements or level in the DNS tree. More detail about that
relationship is included in the "Rationale" document.
NEW:
When discussing the DNS, this document generally assumes the
terminology used in the DNS specifications [RFC1034] [RFC1035]. The
term "lookup" is used to describe the combination of operations
performed by the IDNA2008 protocol and those actually performed by a
DNS resolver. The process of placing an entry into the DNS is
referred to as "registration", similar to common contemporary usage
in other contexts. Consequently, any DNS zone administration is
described as a "registry", regardless of the actual administrative
arrangements or level in the DNS tree. More detail about that
relationship is included in the "Rationale" document.
Section 2.3., paragraph 1:
OLD:
[[anchor11: Formerly Section 1.5.4 of Rationale-03 with some material
removed and left in that document.]]
This section defines some terminology to reduce dependence on terms
and definitions that have been problematic in the past.
NEW:
This section defines some terminology to reduce dependence on terms
and definitions that have been problematic in the past.
Section 2.3.1.6., paragraph 2:
OLD:
An "IDN-aware domain name slot" is defined in this document to be a
domain name slot explicitly designated for carrying an
internationalized domain name as defined in this document. The
designation may be static (for example, in the specification of the
protocol or interface) or dynamic (for example, as a result of
negotiation in an interactive session).
NEW:
An "IDN-aware domain name slot" is defined for this set of documents
to be a domain name slot explicitly designated for carrying an
internationalized domain name as defined in this document. The
designation may be static (for example, in the specification of the
protocol or interface) or dynamic (for example, as a result of
negotiation in an interactive session).
Section 2.3.1.6., paragraph 3:
OLD:
An "IDN-unaware domain name slot" is defined in this document to be
any domain name slot that is not an IDN-aware domain name slot.
Obviously, this includes any domain name slot whose specification
predates IDNA.
NEW:
An "IDN-unaware domain name slot" is defined for this set of
documents to be any domain name slot that is not an IDN-aware domain
name slot. Obviously, this includes any domain name slot whose
specification predates IDNA.
Section 2.3.1.6., paragraph 4:
OLD:
2.3.2. Punycode is an Algorithm, not a Name or Adjective
NEW:
2.3.2. Order of Characters in Labels
Section 2.3.1.6., paragraph 5:
OLD:
[[anchor17: Formerly Section 1.5.5 of Rationale-03.]]
NEW:
Because IDN labels may contain characters that are read, and
preferentially displayed, from right to left, there is a potential
ambiguity about which character in a label is "first". For the
purposes of these specifications, labels are considered, and
characters numbered, strictly in the order in which they appear "on
the wire". That order is equivalent to the leftmost character being
treated as first in a label that is read left-to-right and to the
righmost character being first in a label that is read right-to-left.
The "Bidi" specification contains additional discussion of the
conditions that influence reading order.
2.3.3. Punycode is an Algorithm, not a Name or Adjective
Section 5., paragraph 2:
OLD:
Specific textual suggestions after the extraction process came from
Bill McQuillan.
NEW:
Specific textual suggestions after the extraction process came from
Vint Cerf and Bill McQuillan.
Appendix A., paragraph 8:
OLD:
Author's Address
NEW:
A.3. Version -02
o All back pointers to section numbers in Rationale have been
removed.
o Some definitions clarified. Added one about string order.
Author's Address
More information about the Idna-update
mailing list