To be published: draft-idnabis-issues-00.txt
John C Klensin
klensin at jck.com
Mon Oct 16 14:27:33 CEST 2006
I've attached the version of idnabis-issues that was just
submitted for Internet-Draft posting.
Thanks to everyone for your comments. I think most of them have
been addressed at least partially. We owe responses to points
raised to a few of you, which will be out, I hope, today. This
document is still, from my perspective, in a fairly early stage
-- there are a number of loose ends and placeholders both
implicit and explicit.
As Harald wrote in his note, while we recognize that not all
readers will agree with everything written here, I hope we have
done a good job of reflecting the concerns that have been voiced
to us.
Please comment at will - the next official version will
undoubtedly be after the IETF meeting in San Diego, but quick
comments are very welcome!
for the team,
john
-------------- next part --------------
Network Working Group J. Klensin, Ed.
Internet-Draft October 16, 2006
Intended status: Informational
Expires: April 19, 2007
Proposed Issues and Changes for IDNA - An Overview
draft-idnabis-issues-00.txt
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 19, 2007.
Copyright Notice
Copyright (C) The Internet Society (2006).
Abstract
A recent IAB report identified issues that have been raised with
Internationalized Domain Names (IDNs) some of which require tuning of
the existing protocols and the tables on which they depend. Based on
intensive discussion by an informal design team, this document
further explains some of the issues that have been encountered and
provides explanatory material for some of the proposals that are
being made.
Klensin Expires April 19, 2007 [Page 1]
Internet-Draft IDNAbis Issues October 2006
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Context and Overview . . . . . . . . . . . . . . . . . . . 3
1.2. Discussion Forum . . . . . . . . . . . . . . . . . . . . . 3
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3
2. The IDNA Model . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Registration of IDNs . . . . . . . . . . . . . . . . . . . 4
2.1.1. Proposed label . . . . . . . . . . . . . . . . . . . . 4
2.1.2. Conversion to Unicode . . . . . . . . . . . . . . . . 4
2.1.3. Permitted Character Identification . . . . . . . . . . 5
2.1.4. Stringprep Mappings . . . . . . . . . . . . . . . . . 5
2.1.5. Post-Stringprep Character String Checking and
Processing . . . . . . . . . . . . . . . . . . . . . . 6
2.1.6. Registry Restrictions . . . . . . . . . . . . . . . . 6
2.1.7. Punycode Conversion . . . . . . . . . . . . . . . . . 7
2.1.8. Insertion in the Zone . . . . . . . . . . . . . . . . 7
2.2. Domain Name Resolution (Lookup) . . . . . . . . . . . . . 7
2.2.1. User input . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2. Conversion to Unicode . . . . . . . . . . . . . . . . 7
2.2.3. Pre-Nameprep Validation and Character List Testing . . 7
2.2.4. Stringprep Processing . . . . . . . . . . . . . . . . 7
2.2.5. Post-Nameprep Processing . . . . . . . . . . . . . . . 8
2.2.6. Punycode Conversion . . . . . . . . . . . . . . . . . 8
2.2.7. Name Resolution . . . . . . . . . . . . . . . . . . . 8
3. IDNA200x Document List . . . . . . . . . . . . . . . . . . . . 8
4. Permitted Characters: An inclusion list . . . . . . . . . . . 8
5. The Question of Prefix Changes . . . . . . . . . . . . . . . . 9
5.1. Conditions requiring a prefix change . . . . . . . . . . . 9
5.2. Conditions not requiring a prefix change . . . . . . . . . 10
6. Stringprep Changes and Compatibility . . . . . . . . . . . . . 10
7. Display and Network order . . . . . . . . . . . . . . . . . . 11
8. The Ligature and Digraph Problem . . . . . . . . . . . . . . . 12
9. Right-to-left text . . . . . . . . . . . . . . . . . . . . . . 13
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14
11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 14
12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
13. Security Considerations . . . . . . . . . . . . . . . . . . . 14
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
14.1. Normative References . . . . . . . . . . . . . . . . . . . 15
14.2. Informative References . . . . . . . . . . . . . . . . . . 16
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16
Intellectual Property and Copyright Statements . . . . . . . . . . 18
Klensin Expires April 19, 2007 [Page 2]
Internet-Draft IDNAbis Issues October 2006
1. Introduction
1.1. Context and Overview
A recent IAB report identified issues that have been raised with
Internationalized Domain Names (IDNs) and the associated standards.
Those standards are known as Internationalized Domain Names in
Applications (IDNA), taken from the name of the highest level
standard within that group (see Section 1.3). Based on discussion of
those issues and their impact, some of these standards now require
tuning the existing protocols and the tables on which they depend.
This document further explains, based on the results of some
intensive discussions by an informal design team, some of the issues
that have been encountered. It also provides explanatory material
for some of the proposals that are being made. Explanatory material
for other proposals will appear with the associated documents.
This document begins with a discussion of the IDNA model and the
general differences in strategy between the original version of IDNA
and the proposed new version, then continues with a description of
specific changes that are needed.
[[anchor3: This initial draft is very preliminary and contains
significant omissions. Some, but not all, of these are identified by
explicit placeholders similar to this one.]]
1.2. Discussion Forum
This work is being discussed on the mailing list
idn-update at alvestrand.no
1.3. Terminology
This document uses the term "IDNA2003" to refer to the set of
standards that make up and support the version of IDNA published in
2003, i.e., [RFC3490], [RFC3491], [RFC3492], and [RFC3454]. The term
"IDNA200x" is used to refer to a possible new version of IDNA without
specifying which particular documents would be impacted. While more
common IETF usage might refer to the successor document(s) as
"IDNAbis", this document uses that term, and similar ones, to refer
to successors to the individual documents, e.g., "IDNAbis" is a
synonym for the specific successor to RFC3490, or "RFC3490bis". See
also Section 3.
Protocols in the IDNA group such as RFC 3454, RFC 3491 and RFC 3492
are referred to by their popular names of "Stringprep", "Nameprep",
and "Punycode", respectively.
Klensin Expires April 19, 2007 [Page 3]
Internet-Draft IDNAbis Issues October 2006
The term "Unicode" in this document refers to Unicode 3.2 [Unicode32]
when it is used in the context of IDNA2003 and to Unicode 5.0
[Unicode50] in the context of IDNA200x. For the purposes of this
document -- i.e., general explanation and issues that do not address
specific code points or blocks -- Unicode 3.2, Unicode 4.0
[Unicode40], and Unicode 5.0 are essentially equivalent.
2. The IDNA Model
IDNA is a client-side protocol, i.e., almost all of the processing is
performed by the client. The strings that appear in, and are
resolved by, the DNS consist entirely of ASCII characters, conforming
to the traditional rules for the naming of hosts, and consisting of
only ASCII letters, digits, and hyphens. This approach permits IDNA
to be deployed without modifications to the DNS itself which, in
turn, avoids having to upgrade the entire Internet at once to support
IDNs and the unknown risks of DNS changes to deployed systems.
IDNA has the following logical flow in domain name registration and
resolution. The IDNA2003 specification explicitly includes the
equivalents of the steps in Section 2.1.3, Section 2.1.4,
Section 2.1.5, and Section 2.1.7. The omission of an explicit
discussion of the other steps has been one source of confusion.
Another source has been definition of IDNA2003 as an explicit
algorithm, expressed partially in prose and partially in pseudocode.
The steps below conform to more traditional IETF practice; the
functions are specified, rather than algorithm. The breakdown into
steps is for clarity of explanation; any implementation that produces
the same result with the same inputs is conforming.
2.1. Registration of IDNs
2.1.1. Proposed label
The registrant submits a request for an IDN, representing it in the
local character coding used by the operating system. This string is
typically produced by keyboard entry and converted to the local
character set by the keyboard driver software. [[anchor7: JcK: are we
sure 'keyboard driver' is going to make sense to the audience.
Certainly it is ok for the IETF part.]]
2.1.2. Conversion to Unicode
Some system routine, or a localized front-end to the IDNA process,
converts the proposed label to a Unicode string. This conversion is
obviously trivial in a Unicode-native system but may involve some
complexity in one that is not, especially if the characters of the
Klensin Expires April 19, 2007 [Page 4]
Internet-Draft IDNAbis Issues October 2006
local character set do not map exactly and unambiguously onto Unicode
characters. Depending on the system involved, the major difficulty
may not lie in the mapping but in accurately identifying the incoming
character set and then applying the correct conversion routine.
2.1.3. Permitted Character Identification
The Unicode string is examined to prohibit characters that IDNA does
not permit in input. IDNA200x uses an inclusion-based approach,
i.e., a list of characters that are permitted, rather than the
exclusion-based approach of IDNA2003 (see Section 4). Under
IDNA2003, the list of excluded characters is quite limited because
the model was to permit almost all Unicode characters to be used as
input with many of them mapped into others. There is now general
consensus that this exclusion-based model was a mistake and should be
replaced, in IDNA200x, by a system that lists only those characters
that are permitted and does much less mapping.
Under the proposed IDNA200x, the string in Unicode form will be
rejected if it contains characters that are not on the list of
characters acceptable as IDNA input.
[[anchor8: Examples of impacted characters needed.]]
2.1.4. Stringprep Mappings
In the model of IDNA200x, Nameprep and Stringprep will be respecified
to depend on Unicode properties, rather than on explicit character
lists that are dependent on Unicode version. This change in
definition does not change the functional model of IDNA processing
(or of Stringprep-based processing more generally), but conceptually
turns it into the clear set of steps described here and localizes
dependencies on Unicode definitions and properties.
2.1.4.1. Normalization
The filtered string is then normalized (a Unicode concept, see any
version of the Unicode Standard) to make string comparison possible
even though some strings can be represented in several different ways
in Unicode. In IDNA2003, the normalization method specified in
Stringprep and invoked by Nameprep is based on Unicode method NFKC
[Unicode-USX15]. The FC_NFKC_Closure property [FC-NFKC] is applied
to facilitate subsequent case-folding. For IDNA200x, the new Stable
NFKC method is used as a base to facilitate migration to future
versions of Unicode but, because many of the characters permitted and
then mapped to others in IDNA2003 are not permitted by IDNA200x
(since most characters that would be mapped to others by
compatibility equivalences are prohibited), the normalization
Klensin Expires April 19, 2007 [Page 5]
Internet-Draft IDNAbis Issues October 2006
operation is less extensive.
2.1.4.2. Case-folding
The normalized string is then case-mapped for scripts that make case
distinctions similar to those of Greek to permit approximating the
ASCII-case matching applied on name resolution in the DNS. Strictly
speaking, case-folding starts with the normalization process above,
then strings are case-folded, then they are normalized again. The
application of the "FC_NFKC_Closure" property above simplifies this
process in practice.
[[anchor11: Examples of impacted characters needed.]]
2.1.5. Post-Stringprep Character String Checking and Processing
All characters output from the step above are then verified for the
permissibility for IDNA, i.e., presence in the table of included
characters (see Section 4). Additional transformations that do not
occur as the result of the steps above may be specified at this point
by IDNA200x.
[[anchor12: Examples of impacted characters needed.]]
2.1.6. Registry Restrictions
Registries at all levels of the DNS, not just the top level, are
expected to establish policies about the labels that may be
registered, and for the processes associated with that action. Such
restrictions have always existed in the DNS and have always been
applied at registration time, with the most notable example being
enforcement of the hostname (LDH) convention itself. For IDNs, the
restrictions to be applied are not an IETF matter except insofar as
they derive from restrictions imposed by application protocols (e.g.,
email has always required a more restricted syntax for domain names
than the restrictions of the DNS itself). Because these are
restrictions on what can be registered, it is not generally necessary
that they be global. If a name is not found on resolution, it is not
relevant whether it could have been registered; only that it was not
registered. Registry restrictions might include prohibition of
mixed-script labels, or restrictions on labels permitted in a zone if
certain other labels are already present (See [RFC3743] and [RFC4290]
for discussion of some of the methods that have been applied by some
registries). The various sets of ICANN IDN Guidelines
[ICANN-Guidelines] also suggest restrictions that might sensibly be
imposed.
The string produced by the above steps is checked and processed as
Klensin Expires April 19, 2007 [Page 6]
Internet-Draft IDNAbis Issues October 2006
appropriate to local registry restrictions. This may result in the
rejection of some labels or the application of special restrictions
to others.
[[anchor13: Examples of impacted characters needed.]]
2.1.7. Punycode Conversion
The domain name label resulting from the processes above is converted
to its Punycode encoding (i.e., the "xn--..." form). Punycode is not
changed in IDNA200x.
2.1.8. Insertion in the Zone
The Punycode-encoded string is then registered in the DNS by
insertion into a zone.
2.2. Domain Name Resolution (Lookup)
2.2.1. User input
The user supplies a string in the local character set, typically by
typing it or clicking on a URI or IRI.
2.2.2. Conversion to Unicode
The local character set, character coding conventions, and, as
necessary, display and presentation conventions, are converted to
Unicode, paralleling the process above.
2.2.3. Pre-Nameprep Validation and Character List Testing
Again in parallel to the above, the Unicode string is checked to
verify that all characters that appear in it are valid for IDNA
input. As discussed in Section 4, this check should probably be more
liberal than that of Section 2.1.4: characters that fall into
"pending", "possibly later", or "unassigned codepoint" categories in
the inclusion tables should probably not lead to label rejection at
this point. Instead, the resolver should (MUST?) rely on the
presence or absence of labels containing such characters in the DNS
to determine their validity.
2.2.4. Stringprep Processing
As above, the validated Unicode string is normalized (using Stable
NFKC) and case-mapped. IDNA2003 uses explicit codepoint tables in
Stringprep to accomplish both of these operations.
Klensin Expires April 19, 2007 [Page 7]
Internet-Draft IDNAbis Issues October 2006
2.2.5. Post-Nameprep Processing
Any necessary processing is applied to the normalized and case-mapped
output string from the above.
2.2.6. Punycode Conversion
The validated string is converted to Punycode.
2.2.7. Name Resolution
The Punycode-encoded form of the label is looked up in the DNS, using
normal DNS procedures.
3. IDNA200x Document List
[[anchor15: This section will need to be extensively revised or
removed before publication.]]
The following documents are expected to be produced as part of the
IDNA200x effort.
o This document, containing an overview and rationale.
o A document describing the "BIDI problem" with Stringprep and
proposing a solution [IDNA200X-BIDI].
o A list of initially permitted code points, based on Unicode 5.0
code blocks. See Section 4.
o [[anchor16: ...More ??? ...]]
4. Permitted Characters: An inclusion list
Moving to an inclusion model requires a new list of characters that
are permitted in IDNs. An initial version of such a list has been
developed by the contributors to this document [IDNA200X-Blocks].
This was accomplished by going through Unicode 5.0 one block and one
character class at a time and determining which characters, classes,
blocks were clearly acceptable for IDNs, which one were clearly
unacceptable (e.g., all blocks consisting entirely of compatibility
characters and non-language symbols were excluded as were a number of
character classes), and which blocks and classes were in need of
further study or input from the relevant language communities. The
discussion in [IDNA200X-BIDI] illustrates areas in which more work
and input is needed. It is expected that such problems will be
Klensin Expires April 19, 2007 [Page 8]
Internet-Draft IDNAbis Issues October 2006
resolved quickly and the questioned scripts added to the list of
permitted characters.
A procedure for adding additional characters to the inclusion list,
either from blocks that are associated with notes in
[IDNA200X-Blocks] or from future versions of Unicode, will be
developed as part of this work. A key part of that procedure will be
specifications that, in fact, make it possible to add new characters
and blocks without long delays in implementation. For example, it
may be desirable to more strongly distinguish between use of the
protocols for "registration" (i.e., entering names in the DNS) and
"lookup" (queries to the DNS), with most character inclusion rules
applied at registration time only and clients generating queries
relying on the lookup process to return "not found" errors if
characters were invalid.
[[anchor17: That procedure is an important issue and this is a
placeholder.]]
5. The Question of Prefix Changes
The conditions that would require a change in the IDNA "prefix"
("xn--" for the version of IDNA specified in [RFC3490]) have been a
great concern to the community. A prefix change would clearly be
necessary if the algorithms were modified in a manner that would
create serious ambiguities during subsquent transition in
registrations. This section summarizes our conclusions about the
conditions under which changes in prefix would be necessary.
5.1. Conditions requiring a prefix change
An IDN prefix change is needed if a given string would resolve or
otherwise be interpreted differently depending on the version of the
protocol or tables being used. Consequently, work to update IDNs
would require a prefix change if, and only if, one of the following
four conditions were met:
1. The conversion of a Punycode string to Unicode yields one string
under IDNA2003 (RFC3490) and a different string under IDNA200x.
2. An input string that is valid under IDNA2003 and also valid under
IDNA200x yields two different Punycode strings with the different
versions . This condition is believed to be essentially
equivalent to the one above.
Note, however, that if the input string is valid under one
version and not valid under the other, this condition does not
Klensin Expires April 19, 2007 [Page 9]
Internet-Draft IDNAbis Issues October 2006
apply. See the first item in Section 5.2, below.
3. A fundamental change is made to the semantics of the string that
is inserted in the DNS, e.g., if a decision were made to try to
include language or specific script information in that string,
rather than having it be just a string of characters.
4. Sufficient characters are added to Unicode that the Punycode
mechanism for offsets to blocks does not have enough capacity to
reference the higher-numbered planes and blocks. This condition
is unlikely even in the long term and certain to not arise in the
next few years.
5.2. Conditions not requiring a prefix change
In particular, as a result of the principles described above, none of
the following changes require a new prefix:
1. Prohibition of some characters as input to IDNA. This may make
names that are now registered inaccessible, but does not require
a prefix change.
2. Adjustments in Stringprep tables or IDNA actions, including
normalization definitions, that do not impact characters that
have already been invalid under IDNA2003.
3. Changes in the style of definitions of Stringprep or Nameprep
that do not alter the actions performed by them.
6. Stringprep Changes and Compatibility
Concerns have been expressed that, in attempting to improve the
handling of IDNs, changes will be made to Stringprep that will cause
problems for other uses of that specification, notably protocols used
for identification or authentication. The section above (Section 5)
essentially applies in this context as well: the proposed new
inclusion tables [IDNA200X-Blocks], the reduction in the number of
characters permitted as input to Stringprep Section 4, and even the
proposed changes in handling of right-to-left strings [IDNA200X-BIDI]
either give interpretations to strings prohibited under IDNA2003 or
prohibit strings that IDNA2003 permitted. Strings that are valid
under both IDNA2003 and IDNA200X, and the corresponding versions of
Stringprep, are not changed in interpretation.
Perhaps even more important in practice, since the other known uses
of Stringprep encode or process characters that are already in
normalized form and expect the use of only those characters that can
Klensin Expires April 19, 2007 [Page 10]
Internet-Draft IDNAbis Issues October 2006
be used in writing words of languages, the changes proposed here and
in [IDNA200X-Blocks] are unlikely to have any impact at all.
7. Display and Network order
For correct treatment of domain names one must distinguish between
Network Order (the order in which the codepoints are sent in
protocols) and Display Order (the order in which the codepoints are
displayed on a screen or paper). The order of one label in a domain
name is discussed in [IDNA200X-BIDI]. But there are also questions
about the order in which labels are to be displayed if left-to-right
and right-to-left labels are adjacent to each other, especially after
more than one appearance of one of the types. That decision is
ultimately under the control of user agents --including web browsers,
mail clients, and the like-- which may be highly localized. Even
when formats are specified by protocols, the full composition of an
Internationalized Resource Identifier (IRI) [RFC3987] or
Internationalized Email address contain elements other than the
domain name. For example, IRIs contain protocol identifiers and
field delimiter syntax such as "http://" or "mailto:" while email
addresses contain the "@" to separate local parts from domain names.
User agents are not required to use those protocol-based forms
directly but often do so. Do the protocol constraints imply that the
overall direction of these strings will always be left-to-right (or
right-to-left) for an IRI or email address? Should they?
These questions could have several possible answers. If one has a
domain name abc.def in which both labels are represented in scripts
that are written right-to-left, should it be displayed as fed.cba or
cba.fed? One can notice that, in network order, an IRI for clear-
text web access would begin with "http://" and the characters will
appear as "http://abc.def". But what does this suggest about the
display order? When entering a URI to many browsers, one may
possibly enter only the domain name (leaving the "http://" to be
filled in by default and assuming no tail -- an approach that does
not work for other protocols). The natural display order for the
typed domain name on a right-to-left system is fed.cba. Does this
change if a protocol identifier, tail, and the corresponding
delimiters are specified?
While logic, precedent, and reality suggest that these are questions
for user interface design, not IETF protocol specifications,
experience in the 1980s and 1990s of mixing systems in which domain
name labels were read in network order (left-to-right) and those in
which those labels were read right-to-left would predict a great deal
of confusion, and heuristics that sometimes fail, if each
implementation of each application makes its own decisions on these
Klensin Expires April 19, 2007 [Page 11]
Internet-Draft IDNAbis Issues October 2006
issues.
It should be obvious that any revision of IDNA must be more clear
about the distinction between network and display order for complete
(fully-qualified) domain names as well as just individual labels than
the original specification did. It is likely that some strong
suggestions should be made about display order as well.
[[anchor21: Some specific examples probably needed, although they
will need to be spelled out to permit rendering in ASCII.]]
8. The Ligature and Digraph Problem
There are a number of languages written with alphabetic scripts in
which single phonemes are written using two characters, termed a
"digraph", for example, the "ph" in "pharmacy" and "telephone".
(Note that characters paired in this manner can also appear
consecutively without forming a digraph, as in "tophat".) Certain
digraphs are normally indicated typographically by setting the two
characters closer together than they would be if used consecutively
to represent different phonemes. Some digraphs are fully joined as
ligatures (strictly designating setting totally without intervening
white space, although the term is sometimes applied to close set
pairs). An example of this may be seen when the word "encyclopaedia"
is set with a U+00E6 LATIN SMALL LIGATURE AE.
Difficulties arise from the fact that a given ligature may be a
completely optional typographic convenience for representing a
digraph in one language (as in the preceding example), while in
another language it is a single character that may not always be
correctly representable by a two-letter sequence. This can be
illustrated by many words in the Norwegian language, where the "ae"
ligature is the 27th letter of a 29-letter extended Latin alphabet.
It is equivalent to the 28th letter of the Swedish alphabet (also
containing 29 letters), U+00E4 LATIN SMALL LETTER A WITH DIAERESIS,
for which an "ae" cannot be substituted acording to current
orthographic standards.
This character (U+00E4) is also part of the German alphabet where,
unlike in the Nordic languages, the two-character sequence "ae" is a
fully acceptable alternate orthography. The inverse is however not
true, and those two characters cannot necessarily be combined into an
"umlauted a". This also applies to another German character, the
"umlauted o" (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for
example, cannot be used for writing the name of the author "Goethe".
It is also a letter in the Swedish alphabet where, in parallel to the
"umlauted a", it cannot be correctly represented as "oe".
Klensin Expires April 19, 2007 [Page 12]
Internet-Draft IDNAbis Issues October 2006
Additional situations with alphabets written right-to-left are
described in [IDNA200X-BIDI]. This constitutes a problem that cannot
be resolved solely by operating on scripts. It is, however, a key
concern in the IDN context. Its satisfactory resolution will require
support in policies set by registries, which therefore need to be
particularly mindful not just of this specific issue, but of all
other related matters that cannot be dealt with on an exclusively
algorithmic basis.
Just as with the examples of different-looking characters that may be
assumed to be the same, as discussed in Section 2.2.6 of [RFC4690],
it is in general impossible to deal with these situations in a system
such as IDNA -- or Unicode normalization generally -- since
determining what to do requires information about the language being
used, context, or both. Consequently, IDNAbis makes no attempt to
treat these combined characters in any special way. However, this is
a prime example of a situation where a registry that is aware of the
language context in which labels are to be registered, and where that
language sometimes (or always) treats the two-character sequences as
equivalent to the combined form, should give serious consideration to
applying a "variant" model [RFC3743] [RFC4290] to reduce the
opportunities for user confusion and fraud that would result from the
related strings being registered to different parties.
9. Right-to-left text
In order to be sure that the directionality of text is unambiguous,
Stringprep requires that any label in which right-to-left characters
appear both starts and ends with characters that are unambiguously
directional, and rejects any other string that contains a right-to-
left character. This is one of the few places where the IDNA
algorithms essentially look at an entire label, not just at
individual characters. Unfortunately, the algorithmic model, as
defined in Stringprep, fails when the final character in a right-to-
left string is "decorated", i.e., requires a combining character to
be correctly represented. The combining character is not identified
with the right-to-left character attribute, so Stringprep rejects the
string.
This problem manifests itself in languages written with consonantal
alphabets in which vowels are indicated as combining marks, and where
they are an essential component of the orthography. Examples of this
are Yiddish, written with an extended Hebrew script, and Dhivehi (the
official language of Maldives) which is written in the Thaana script
(which is, in turn, derived from the Arabic script). Other languages
are still being investigated, but Stringprep definitely needs to be
adjusted.
Klensin Expires April 19, 2007 [Page 13]
Internet-Draft IDNAbis Issues October 2006
10. Acknowledgements
The editor and contributors would like to express their thanks to
those who contributed significant early review comments, sometimes
accompanied by text, especially Mark Davis, Paul Hoffman, Simon
Josefsson, and Sam Weiler.
... More to be supplied...
11. Contributors
While the listed editor held the pen, this document represents the
joint work and conclusions of an ad hoc design team consisting of the
editor and, in alphabetic order, Harald Alvestrand, Tina Dam, Patrik
Faltstrom, and Cary Karp. In addition, there were may specific
contributions and helpful comments from those listed in the
Acknowledgments section and others who have contributed to the
development and use of the IDNA protocols.
12. IANA Considerations
While this document does not contain specific actions for IANA, it
anticipates the creation of a registry of Unicode blocks and
characters permitted in IDNs and a mechanism for expanding that
registry. See Section 4.
13. Security Considerations
Any change to Stringprep or, more broadly, the IETF's model of the
use of internationalized character strings in different protocols,
creates some risk of inadvertent changes to those protocols,
invalidating deployed applications or databases, and so on. Our
current hypothesis is that the same considerations that would require
changing the IDN prefix (see Section 5.2) are the ones that would,
e.g., invalidate certificates or hashes that depend on Stringprep,
but those cases require careful consideration and evaluation.
...???more to be supplied...
14. References
Klensin Expires April 19, 2007 [Page 14]
Internet-Draft IDNAbis Issues October 2006
14.1. Normative References
[FC-NFKC] The Unicode Consortium, "Derived Property:
FC_NFKC_Closure", June 2006, <http://www.unicode.org/
Public/UNIDATA/DerivedNormalizationProps.txt>.
[IDNA200X-BIDI]
Alvestrand, H. and C. Karp, "An IDNA problem in right-to-
left scripts", October 2006, <http://www.ietf.org/
internet-drafts/draft-alvestrand-idna-bidi-00.txt>.
[IDNA200X-Blocks]
Faltstrom, P., "??? Permitted Character List for IDNA
(placeholder)", October 2006,
<draft-faltstrom-idnabis-tables-00.txt>.
A version of this document, with color coding to make the
categories more clear, and supplemental materials, are
available at http://stupid.domain.name/idnabis/00.html
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454,
December 2002.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003.
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names (IDN)",
RFC 3491, March 2003.
[RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode
for Internationalized Domain Names in Applications
(IDNA)", RFC 3492, March 2003.
[RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
Engineering Team (JET) Guidelines for Internationalized
Domain Names (IDN) Registration and Administration for
Chinese, Japanese, and Korean", RFC 3743, April 2004.
[RFC4290] Klensin, J., "Suggested Practices for Registration of
Internationalized Domain Names (IDN)", RFC 4290,
December 2005.
[Unicode-USX15]
The Unicode Consortium, "Unicode Standard Annex #15:
Unicode Normalization Forms", 2006,
Klensin Expires April 19, 2007 [Page 15]
Internet-Draft IDNAbis Issues October 2006
<http://www.unicode.org/reports/tr15/>.
[Unicode32]
The Unicode Consortium, "The Unicode Standard, Version
3.0", 2000.
(Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5).
Version 3.2 consists of the definition in that book as
amended by the Unicode Standard Annex #27: Unicode 3.1
(http://www.unicode.org/reports/tr27/) and by the Unicode
Standard Annex #28: Unicode 3.2
(http://www.unicode.org/reports/tr28/).
[Unicode40]
The Unicode Consortium, "The Unicode Standard, Version
4.0", 2003.
[Unicode50]
The Unicode Consortium, "The Unicode Standard, Version
5.0", 2006.
Forthcoming fourth quarter 2006. Available online at
http://www.unicode.org/versions/Unicode5.0.0/
14.2. Informative References
[ICANN-Guidelines]
ICANN, "IDN Implementation Guidelines", 2006,
<http://www.icann.org/topics/idn/>.
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRIs)", RFC 3987, January 2005.
[RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
Recommendations for Internationalized Domain Names
(IDNs)", RFC 4690, September 2006.
Klensin Expires April 19, 2007 [Page 16]
Internet-Draft IDNAbis Issues October 2006
Author's Address
John C Klensin (editor)
1770 Massachusetts Ave, Ste 322
Cambridge, MA 02140
USA
Phone: +1 617 245 1457
Fax:
Email: john+ietf at jck.com
URI:
Klensin Expires April 19, 2007 [Page 17]
Internet-Draft IDNAbis Issues October 2006
Full Copyright Statement
Copyright (C) The Internet Society (2006).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr at ietf.org.
Acknowledgment
Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA).
Klensin Expires April 19, 2007 [Page 18]
More information about the Idna-update
mailing list