To be published: draft-idnabis-issues-00.txt

Mon Oct 16 14:27:33 CEST 2006

I've attached the version of idnabis-issues that was just
submitted for Internet-Draft posting.

Thanks to everyone for your comments.  I think most of them have
been addressed at least partially.  We owe responses to points
raised to a few of you, which will be out, I hope, today.   This
document is still, from my perspective, in a fairly early stage
-- there are a number of loose ends and placeholders both
implicit and explicit.

As Harald wrote in his note, while we recognize that not all
readers will agree with everything written here, I hope we have
done a good job of reflecting the concerns that have been voiced
to us.

Please comment at will - the next official version will
undoubtedly be after the IETF meeting in San Diego, but quick
comments are very welcome!

  for the team,
     john
-------------- next part --------------

Network Working Group                                    J. Klensin, Ed.
Internet-Draft                                          October 16, 2006
Intended status: Informational
Expires: April 19, 2007

           Proposed Issues and Changes for IDNA - An Overview
                      draft-idnabis-issues-00.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on April 19, 2007.

Copyright Notice

   Copyright (C) The Internet Society (2006).

Abstract

   A recent IAB report identified issues that have been raised with
   Internationalized Domain Names (IDNs) some of which require tuning of
   the existing protocols and the tables on which they depend.  Based on
   intensive discussion by an informal design team, this document
   further explains some of the issues that have been encountered and
   provides explanatory material for some of the proposals that are
   being made.

Klensin                  Expires April 19, 2007                 [Page 1]

Internet-Draft               IDNAbis Issues                 October 2006

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Context and Overview . . . . . . . . . . . . . . . . . . .  3
     1.2.  Discussion Forum . . . . . . . . . . . . . . . . . . . . .  3
     1.3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  The IDNA Model . . . . . . . . . . . . . . . . . . . . . . . .  4
     2.1.  Registration of IDNs . . . . . . . . . . . . . . . . . . .  4
       2.1.1.  Proposed label . . . . . . . . . . . . . . . . . . . .  4
       2.1.2.  Conversion to Unicode  . . . . . . . . . . . . . . . .  4
       2.1.3.  Permitted Character Identification . . . . . . . . . .  5
       2.1.4.  Stringprep Mappings  . . . . . . . . . . . . . . . . .  5
       2.1.5.  Post-Stringprep Character String Checking and
               Processing . . . . . . . . . . . . . . . . . . . . . .  6
       2.1.6.  Registry Restrictions  . . . . . . . . . . . . . . . .  6
       2.1.7.  Punycode Conversion  . . . . . . . . . . . . . . . . .  7
       2.1.8.  Insertion in the Zone  . . . . . . . . . . . . . . . .  7
     2.2.  Domain Name Resolution (Lookup)  . . . . . . . . . . . . .  7
       2.2.1.  User input . . . . . . . . . . . . . . . . . . . . . .  7
       2.2.2.  Conversion to Unicode  . . . . . . . . . . . . . . . .  7
       2.2.3.  Pre-Nameprep Validation and Character List Testing . .  7
       2.2.4.  Stringprep Processing  . . . . . . . . . . . . . . . .  7
       2.2.5.  Post-Nameprep Processing . . . . . . . . . . . . . . .  8
       2.2.6.  Punycode Conversion  . . . . . . . . . . . . . . . . .  8
       2.2.7.  Name Resolution  . . . . . . . . . . . . . . . . . . .  8
   3.  IDNA200x Document List . . . . . . . . . . . . . . . . . . . .  8
   4.  Permitted Characters: An inclusion list  . . . . . . . . . . .  8
   5.  The Question of Prefix Changes . . . . . . . . . . . . . . . .  9
     5.1.  Conditions requiring a prefix change . . . . . . . . . . .  9
     5.2.  Conditions not requiring a prefix change . . . . . . . . . 10
   6.  Stringprep Changes and Compatibility . . . . . . . . . . . . . 10
   7.  Display and Network order  . . . . . . . . . . . . . . . . . . 11
   8.  The Ligature and Digraph Problem . . . . . . . . . . . . . . . 12
   9.  Right-to-left text . . . . . . . . . . . . . . . . . . . . . . 13
   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14
   11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 14
   12. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
   13. Security Considerations  . . . . . . . . . . . . . . . . . . . 14
   14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
     14.1. Normative References . . . . . . . . . . . . . . . . . . . 15
     14.2. Informative References . . . . . . . . . . . . . . . . . . 16
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16
   Intellectual Property and Copyright Statements . . . . . . . . . . 18

Klensin                  Expires April 19, 2007                 [Page 2]

Internet-Draft               IDNAbis Issues                 October 2006

1.  Introduction

1.1.  Context and Overview

   A recent IAB report identified issues that have been raised with
   Internationalized Domain Names (IDNs) and the associated standards.
   Those standards are known as Internationalized Domain Names in
   Applications (IDNA), taken from the name of the highest level
   standard within that group (see Section 1.3).  Based on discussion of
   those issues and their impact, some of these standards now require
   tuning the existing protocols and the tables on which they depend.
   This document further explains, based on the results of some
   intensive discussions by an informal design team, some of the issues
   that have been encountered.  It also provides explanatory material
   for some of the proposals that are being made.  Explanatory material
   for other proposals will appear with the associated documents.

   This document begins with a discussion of the IDNA model and the
   general differences in strategy between the original version of IDNA
   and the proposed new version, then continues with a description of
   specific changes that are needed.

   [[anchor3: This initial draft is very preliminary and contains
   significant omissions.  Some, but not all, of these are identified by
   explicit placeholders similar to this one.]]

1.2.  Discussion Forum

   This work is being discussed on the mailing list
   idn-update at alvestrand.no

1.3.  Terminology

   This document uses the term "IDNA2003" to refer to the set of
   standards that make up and support the version of IDNA published in
   2003, i.e., [RFC3490], [RFC3491], [RFC3492], and [RFC3454].  The term
   "IDNA200x" is used to refer to a possible new version of IDNA without
   specifying which particular documents would be impacted.  While more
   common IETF usage might refer to the successor document(s) as
   "IDNAbis", this document uses that term, and similar ones, to refer
   to successors to the individual documents, e.g., "IDNAbis" is a
   synonym for the specific successor to RFC3490, or "RFC3490bis".  See
   also Section 3.

   Protocols in the IDNA group such as RFC 3454, RFC 3491 and RFC 3492
   are referred to by their popular names of "Stringprep", "Nameprep",
   and "Punycode", respectively.

Klensin                  Expires April 19, 2007                 [Page 3]

Internet-Draft               IDNAbis Issues                 October 2006

   The term "Unicode" in this document refers to Unicode 3.2 [Unicode32]
   when it is used in the context of IDNA2003 and to Unicode 5.0
   [Unicode50] in the context of IDNA200x.  For the purposes of this
   document -- i.e., general explanation and issues that do not address
   specific code points or blocks -- Unicode 3.2, Unicode 4.0
   [Unicode40], and Unicode 5.0 are essentially equivalent.

2.  The IDNA Model

   IDNA is a client-side protocol, i.e., almost all of the processing is
   performed by the client.  The strings that appear in, and are
   resolved by, the DNS consist entirely of ASCII characters, conforming
   to the traditional rules for the naming of hosts, and consisting of
   only ASCII letters, digits, and hyphens.  This approach permits IDNA
   to be deployed without modifications to the DNS itself which, in
   turn, avoids having to upgrade the entire Internet at once to support
   IDNs and the unknown risks of DNS changes to deployed systems.

   IDNA has the following logical flow in domain name registration and
   resolution.  The IDNA2003 specification explicitly includes the
   equivalents of the steps in Section 2.1.3, Section 2.1.4,
   Section 2.1.5, and Section 2.1.7.  The omission of an explicit
   discussion of the other steps has been one source of confusion.
   Another source has been definition of IDNA2003 as an explicit
   algorithm, expressed partially in prose and partially in pseudocode.
   The steps below conform to more traditional IETF practice; the
   functions are specified, rather than algorithm.  The breakdown into
   steps is for clarity of explanation; any implementation that produces
   the same result with the same inputs is conforming.

2.1.  Registration of IDNs

2.1.1.  Proposed label

   The registrant submits a request for an IDN, representing it in the
   local character coding used by the operating system.  This string is
   typically produced by keyboard entry and converted to the local
   character set by the keyboard driver software. [[anchor7: JcK: are we
   sure 'keyboard driver' is going to make sense to the audience.
   Certainly it is ok for the IETF part.]]

2.1.2.  Conversion to Unicode

   Some system routine, or a localized front-end to the IDNA process,
   converts the proposed label to a Unicode string.  This conversion is
   obviously trivial in a Unicode-native system but may involve some
   complexity in one that is not, especially if the characters of the

Klensin                  Expires April 19, 2007                 [Page 4]

Internet-Draft               IDNAbis Issues                 October 2006

   local character set do not map exactly and unambiguously onto Unicode
   characters.  Depending on the system involved, the major difficulty
   may not lie in the mapping but in accurately identifying the incoming
   character set and then applying the correct conversion routine.

2.1.3.  Permitted Character Identification

   The Unicode string is examined to prohibit characters that IDNA does
   not permit in input.  IDNA200x uses an inclusion-based approach,
   i.e., a list of characters that are permitted, rather than the
   exclusion-based approach of IDNA2003 (see Section 4).  Under
   IDNA2003, the list of excluded characters is quite limited because
   the model was to permit almost all Unicode characters to be used as
   input with many of them mapped into others.  There is now general
   consensus that this exclusion-based model was a mistake and should be
   replaced, in IDNA200x, by a system that lists only those characters
   that are permitted and does much less mapping.

   Under the proposed IDNA200x, the string in Unicode form will be
   rejected if it contains characters that are not on the list of
   characters acceptable as IDNA input.

   [[anchor8: Examples of impacted characters needed.]]

2.1.4.  Stringprep Mappings

   In the model of IDNA200x, Nameprep and Stringprep will be respecified
   to depend on Unicode properties, rather than on explicit character
   lists that are dependent on Unicode version.  This change in
   definition does not change the functional model of IDNA processing
   (or of Stringprep-based processing more generally), but conceptually
   turns it into the clear set of steps described here and localizes
   dependencies on Unicode definitions and properties.

2.1.4.1.  Normalization

   The filtered string is then normalized (a Unicode concept, see any
   version of the Unicode Standard) to make string comparison possible
   even though some strings can be represented in several different ways
   in Unicode.  In IDNA2003, the normalization method specified in
   Stringprep and invoked by Nameprep is based on Unicode method NFKC
   [Unicode-USX15].  The FC_NFKC_Closure property [FC-NFKC] is applied
   to facilitate subsequent case-folding.  For IDNA200x, the new Stable
   NFKC method is used as a base to facilitate migration to future
   versions of Unicode but, because many of the characters permitted and
   then mapped to others in IDNA2003 are not permitted by IDNA200x
   (since most characters that would be mapped to others by
   compatibility equivalences are prohibited), the normalization

Klensin                  Expires April 19, 2007                 [Page 5]

Internet-Draft               IDNAbis Issues                 October 2006

   operation is less extensive.

2.1.4.2.  Case-folding

   The normalized string is then case-mapped for scripts that make case
   distinctions similar to those of Greek to permit approximating the
   ASCII-case matching applied on name resolution in the DNS.  Strictly
   speaking, case-folding starts with the normalization process above,
   then strings are case-folded, then they are normalized again.  The
   application of the "FC_NFKC_Closure" property above simplifies this
   process in practice.

   [[anchor11: Examples of impacted characters needed.]]

2.1.5.  Post-Stringprep Character String Checking and Processing

   All characters output from the step above are then verified for the
   permissibility for IDNA, i.e., presence in the table of included
   characters (see Section 4).  Additional transformations that do not
   occur as the result of the steps above may be specified at this point
   by IDNA200x.

   [[anchor12: Examples of impacted characters needed.]]

2.1.6.  Registry Restrictions

   Registries at all levels of the DNS, not just the top level, are
   expected to establish policies about the labels that may be
   registered, and for the processes associated with that action.  Such
   restrictions have always existed in the DNS and have always been
   applied at registration time, with the most notable example being
   enforcement of the hostname (LDH) convention itself.  For IDNs, the
   restrictions to be applied are not an IETF matter except insofar as
   they derive from restrictions imposed by application protocols (e.g.,
   email has always required a more restricted syntax for domain names
   than the restrictions of the DNS itself).  Because these are
   restrictions on what can be registered, it is not generally necessary
   that they be global.  If a name is not found on resolution, it is not
   relevant whether it could have been registered; only that it was not
   registered.  Registry restrictions might include prohibition of
   mixed-script labels, or restrictions on labels permitted in a zone if
   certain other labels are already present (See [RFC3743] and [RFC4290]
   for discussion of some of the methods that have been applied by some
   registries).  The various sets of ICANN IDN Guidelines
   [ICANN-Guidelines] also suggest restrictions that might sensibly be
   imposed.

   The string produced by the above steps is checked and processed as

Klensin                  Expires April 19, 2007                 [Page 6]

Internet-Draft               IDNAbis Issues                 October 2006

   appropriate to local registry restrictions.  This may result in the
   rejection of some labels or the application of special restrictions
   to others.

   [[anchor13: Examples of impacted characters needed.]]

2.1.7.  Punycode Conversion

   The domain name label resulting from the processes above is converted
   to its Punycode encoding (i.e., the "xn--..." form).  Punycode is not
   changed in IDNA200x.

2.1.8.  Insertion in the Zone

   The Punycode-encoded string is then registered in the DNS by
   insertion into a zone.

2.2.  Domain Name Resolution (Lookup)

2.2.1.  User input

   The user supplies a string in the local character set, typically by
   typing it or clicking on a URI or IRI.

2.2.2.  Conversion to Unicode

   The local character set, character coding conventions, and, as
   necessary, display and presentation conventions, are converted to
   Unicode, paralleling the process above.

2.2.3.  Pre-Nameprep Validation and Character List Testing

   Again in parallel to the above, the Unicode string is checked to
   verify that all characters that appear in it are valid for IDNA
   input.  As discussed in Section 4, this check should probably be more
   liberal than that of Section 2.1.4: characters that fall into
   "pending", "possibly later", or "unassigned codepoint" categories in
   the inclusion tables should probably not lead to label rejection at
   this point.  Instead, the resolver should (MUST?) rely on the
   presence or absence of labels containing such characters in the DNS
   to determine their validity.

2.2.4.  Stringprep Processing

   As above, the validated Unicode string is normalized (using Stable
   NFKC) and case-mapped.  IDNA2003 uses explicit codepoint tables in
   Stringprep to accomplish both of these operations.

Klensin                  Expires April 19, 2007                 [Page 7]

Internet-Draft               IDNAbis Issues                 October 2006

2.2.5.  Post-Nameprep Processing

   Any necessary processing is applied to the normalized and case-mapped
   output string from the above.

2.2.6.  Punycode Conversion

   The validated string is converted to Punycode.

2.2.7.  Name Resolution

   The Punycode-encoded form of the label is looked up in the DNS, using
   normal DNS procedures.

3.  IDNA200x Document List

   [[anchor15: This section will need to be extensively revised or
   removed before publication.]]

   The following documents are expected to be produced as part of the
   IDNA200x effort.

   o  This document, containing an overview and rationale.

   o  A document describing the "BIDI problem" with Stringprep and
      proposing a solution [IDNA200X-BIDI].

   o  A list of initially permitted code points, based on Unicode 5.0
      code blocks.  See Section 4.

   o  [[anchor16: ...More ??? ...]]

4.  Permitted Characters: An inclusion list

   Moving to an inclusion model requires a new list of characters that
   are permitted in IDNs.  An initial version of such a list has been
   developed by the contributors to this document [IDNA200X-Blocks].
   This was accomplished by going through Unicode 5.0 one block and one
   character class at a time and determining which characters, classes,
   blocks were clearly acceptable for IDNs, which one were clearly
   unacceptable (e.g., all blocks consisting entirely of compatibility
   characters and non-language symbols were excluded as were a number of
   character classes), and which blocks and classes were in need of
   further study or input from the relevant language communities.  The
   discussion in [IDNA200X-BIDI] illustrates areas in which more work
   and input is needed.  It is expected that such problems will be

Klensin                  Expires April 19, 2007                 [Page 8]

Internet-Draft               IDNAbis Issues                 October 2006

   resolved quickly and the questioned scripts added to the list of
   permitted characters.

   A procedure for adding additional characters to the inclusion list,
   either from blocks that are associated with notes in
   [IDNA200X-Blocks] or from future versions of Unicode, will be
   developed as part of this work.  A key part of that procedure will be
   specifications that, in fact, make it possible to add new characters
   and blocks without long delays in implementation.  For example, it
   may be desirable to more strongly distinguish between use of the
   protocols for "registration" (i.e., entering names in the DNS) and
   "lookup" (queries to the DNS), with most character inclusion rules
   applied at registration time only and clients generating queries
   relying on the lookup process to return "not found" errors if
   characters were invalid.

   [[anchor17: That procedure is an important issue and this is a
   placeholder.]]

5.  The Question of Prefix Changes

   The conditions that would require a change in the IDNA "prefix"
   ("xn--" for the version of IDNA specified in [RFC3490]) have been a
   great concern to the community.  A prefix change would clearly be
   necessary if the algorithms were modified in a manner that would
   create serious ambiguities during subsquent transition in
   registrations.  This section summarizes our conclusions about the
   conditions under which changes in prefix would be necessary.

5.1.  Conditions requiring a prefix change

   An IDN prefix change is needed if a given string would resolve or
   otherwise be interpreted differently depending on the version of the
   protocol or tables being used.  Consequently, work to update IDNs
   would require a prefix change if, and only if, one of the following
   four conditions were met:

   1.  The conversion of a Punycode string to Unicode yields one string
       under IDNA2003 (RFC3490) and a different string under IDNA200x.

   2.  An input string that is valid under IDNA2003 and also valid under
       IDNA200x yields two different Punycode strings with the different
       versions .  This condition is believed to be essentially
       equivalent to the one above.

       Note, however, that if the input string is valid under one
       version and not valid under the other, this condition does not

Klensin                  Expires April 19, 2007                 [Page 9]

Internet-Draft               IDNAbis Issues                 October 2006

       apply.  See the first item in Section 5.2, below.

   3.  A fundamental change is made to the semantics of the string that
       is inserted in the DNS, e.g., if a decision were made to try to
       include language or specific script information in that string,
       rather than having it be just a string of characters.

   4.  Sufficient characters are added to Unicode that the Punycode
       mechanism for offsets to blocks does not have enough capacity to
       reference the higher-numbered planes and blocks.  This condition
       is unlikely even in the long term and certain to not arise in the
       next few years.

5.2.  Conditions not requiring a prefix change

   In particular, as a result of the principles described above, none of
   the following changes require a new prefix:

   1.  Prohibition of some characters as input to IDNA.  This may make
       names that are now registered inaccessible, but does not require
       a prefix change.

   2.  Adjustments in Stringprep tables or IDNA actions, including
       normalization definitions, that do not impact characters that
       have already been invalid under IDNA2003.

   3.  Changes in the style of definitions of Stringprep or Nameprep
       that do not alter the actions performed by them.

6.  Stringprep Changes and Compatibility

   Concerns have been expressed that, in attempting to improve the
   handling of IDNs, changes will be made to Stringprep that will cause
   problems for other uses of that specification, notably protocols used
   for identification or authentication.  The section above (Section 5)
   essentially applies in this context as well: the proposed new
   inclusion tables [IDNA200X-Blocks], the reduction in the number of
   characters permitted as input to Stringprep Section 4, and even the
   proposed changes in handling of right-to-left strings [IDNA200X-BIDI]
   either give interpretations to strings prohibited under IDNA2003 or
   prohibit strings that IDNA2003 permitted.  Strings that are valid
   under both IDNA2003 and IDNA200X, and the corresponding versions of
   Stringprep, are not changed in interpretation.

   Perhaps even more important in practice, since the other known uses
   of Stringprep encode or process characters that are already in
   normalized form and expect the use of only those characters that can

Klensin                  Expires April 19, 2007                [Page 10]

Internet-Draft               IDNAbis Issues                 October 2006

   be used in writing words of languages, the changes proposed here and
   in [IDNA200X-Blocks] are unlikely to have any impact at all.

7.  Display and Network order

   For correct treatment of domain names one must distinguish between
   Network Order (the order in which the codepoints are sent in
   protocols) and Display Order (the order in which the codepoints are
   displayed on a screen or paper).  The order of one label in a domain
   name is discussed in [IDNA200X-BIDI].  But there are also questions
   about the order in which labels are to be displayed if left-to-right
   and right-to-left labels are adjacent to each other, especially after
   more than one appearance of one of the types.  That decision is
   ultimately under the control of user agents --including web browsers,
   mail clients, and the like-- which may be highly localized.  Even
   when formats are specified by protocols, the full composition of an
   Internationalized Resource Identifier (IRI) [RFC3987] or
   Internationalized Email address contain elements other than the
   domain name.  For example, IRIs contain protocol identifiers and
   field delimiter syntax such as "http://" or "mailto:" while email
   addresses contain the "@" to separate local parts from domain names.
   User agents are not required to use those protocol-based forms
   directly but often do so.  Do the protocol constraints imply that the
   overall direction of these strings will always be left-to-right (or
   right-to-left) for an IRI or email address?  Should they?

   These questions could have several possible answers.  If one has a
   domain name abc.def in which both labels are represented in scripts
   that are written right-to-left, should it be displayed as fed.cba or
   cba.fed?  One can notice that, in network order, an IRI for clear-
   text web access would begin with "http://" and the characters will
   appear as "http://abc.def".  But what does this suggest about the
   display order?  When entering a URI to many browsers, one may
   possibly enter only the domain name (leaving the "http://" to be
   filled in by default and assuming no tail -- an approach that does
   not work for other protocols).  The natural display order for the
   typed domain name on a right-to-left system is fed.cba.  Does this
   change if a protocol identifier, tail, and the corresponding
   delimiters are specified?

   While logic, precedent, and reality suggest that these are questions
   for user interface design, not IETF protocol specifications,
   experience in the 1980s and 1990s of mixing systems in which domain
   name labels were read in network order (left-to-right) and those in
   which those labels were read right-to-left would predict a great deal
   of confusion, and heuristics that sometimes fail, if each
   implementation of each application makes its own decisions on these

Klensin                  Expires April 19, 2007                [Page 11]

Internet-Draft               IDNAbis Issues                 October 2006

   issues.

   It should be obvious that any revision of IDNA must be more clear
   about the distinction between network and display order for complete
   (fully-qualified) domain names as well as just individual labels than
   the original specification did.  It is likely that some strong
   suggestions should be made about display order as well.

   [[anchor21: Some specific examples probably needed, although they
   will need to be spelled out to permit rendering in ASCII.]]

8.  The Ligature and Digraph Problem

   There are a number of languages written with alphabetic scripts in
   which single phonemes are written using two characters, termed a
   "digraph", for example, the "ph" in "pharmacy" and "telephone".
   (Note that characters paired in this manner can also appear
   consecutively without forming a digraph, as in "tophat".)  Certain
   digraphs are normally indicated typographically by setting the two
   characters closer together than they would be if used consecutively
   to represent different phonemes.  Some digraphs are fully joined as
   ligatures (strictly designating setting totally without intervening
   white space, although the term is sometimes applied to close set
   pairs).  An example of this may be seen when the word "encyclopaedia"
   is set with a U+00E6 LATIN SMALL LIGATURE AE.

   Difficulties arise from the fact that a given ligature may be a
   completely optional typographic convenience for representing a
   digraph in one language (as in the preceding example), while in
   another language it is a single character that may not always be
   correctly representable by a two-letter sequence.  This can be
   illustrated by many words in the Norwegian language, where the "ae"
   ligature is the 27th letter of a 29-letter extended Latin alphabet.
   It is equivalent to the 28th letter of the Swedish alphabet (also
   containing 29 letters), U+00E4 LATIN SMALL LETTER A WITH DIAERESIS,
   for which an "ae" cannot be substituted acording to current
   orthographic standards.

   This character (U+00E4) is also part of the German alphabet where,
   unlike in the Nordic languages, the two-character sequence "ae" is a
   fully acceptable alternate orthography.  The inverse is however not
   true, and those two characters cannot necessarily be combined into an
   "umlauted a".  This also applies to another German character, the
   "umlauted o" (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) which, for
   example, cannot be used for writing the name of the author "Goethe".
   It is also a letter in the Swedish alphabet where, in parallel to the
   "umlauted a", it cannot be correctly represented as "oe".

Klensin                  Expires April 19, 2007                [Page 12]

Internet-Draft               IDNAbis Issues                 October 2006

   Additional situations with alphabets written right-to-left are
   described in [IDNA200X-BIDI].  This constitutes a problem that cannot
   be resolved solely by operating on scripts.  It is, however, a key
   concern in the IDN context.  Its satisfactory resolution will require
   support in policies set by registries, which therefore need to be
   particularly mindful not just of this specific issue, but of all
   other related matters that cannot be dealt with on an exclusively
   algorithmic basis.

   Just as with the examples of different-looking characters that may be
   assumed to be the same, as discussed in Section 2.2.6 of [RFC4690],
   it is in general impossible to deal with these situations in a system
   such as IDNA -- or Unicode normalization generally -- since
   determining what to do requires information about the language being
   used, context, or both.  Consequently, IDNAbis makes no attempt to
   treat these combined characters in any special way.  However, this is
   a prime example of a situation where a registry that is aware of the
   language context in which labels are to be registered, and where that
   language sometimes (or always) treats the two-character sequences as
   equivalent to the combined form, should give serious consideration to
   applying a "variant" model [RFC3743] [RFC4290] to reduce the
   opportunities for user confusion and fraud that would result from the
   related strings being registered to different parties.

9.  Right-to-left text

   In order to be sure that the directionality of text is unambiguous,
   Stringprep requires that any label in which right-to-left characters
   appear both starts and ends with characters that are unambiguously
   directional, and rejects any other string that contains a right-to-
   left character.  This is one of the few places where the IDNA
   algorithms essentially look at an entire label, not just at
   individual characters.  Unfortunately, the algorithmic model, as
   defined in Stringprep, fails when the final character in a right-to-
   left string is "decorated", i.e., requires a combining character to
   be correctly represented.  The combining character is not identified
   with the right-to-left character attribute, so Stringprep rejects the
   string.

   This problem manifests itself in languages written with consonantal
   alphabets in which vowels are indicated as combining marks, and where
   they are an essential component of the orthography.  Examples of this
   are Yiddish, written with an extended Hebrew script, and Dhivehi (the
   official language of Maldives) which is written in the Thaana script
   (which is, in turn, derived from the Arabic script).  Other languages
   are still being investigated, but Stringprep definitely needs to be
   adjusted.

Klensin                  Expires April 19, 2007                [Page 13]

Internet-Draft               IDNAbis Issues                 October 2006

10.  Acknowledgements

   The editor and contributors would like to express their thanks to
   those who contributed significant early review comments, sometimes
   accompanied by text, especially Mark Davis, Paul Hoffman, Simon
   Josefsson, and Sam Weiler.

   ...  More to be supplied...

11.  Contributors

   While the listed editor held the pen, this document represents the
   joint work and conclusions of an ad hoc design team consisting of the
   editor and, in alphabetic order, Harald Alvestrand, Tina Dam, Patrik
   Faltstrom, and Cary Karp.  In addition, there were may specific
   contributions and helpful comments from those listed in the
   Acknowledgments section and others who have contributed to the
   development and use of the IDNA protocols.

12.  IANA Considerations

   While this document does not contain specific actions for IANA, it
   anticipates the creation of a registry of Unicode blocks and
   characters permitted in IDNs and a mechanism for expanding that
   registry.  See Section 4.

13.  Security Considerations

   Any change to Stringprep or, more broadly, the IETF's model of the
   use of internationalized character strings in different protocols,
   creates some risk of inadvertent changes to those protocols,
   invalidating deployed applications or databases, and so on.  Our
   current hypothesis is that the same considerations that would require
   changing the IDN prefix (see Section 5.2) are the ones that would,
   e.g., invalidate certificates or hashes that depend on Stringprep,
   but those cases require careful consideration and evaluation.

   ...???more to be supplied...

14.  References

Klensin                  Expires April 19, 2007                [Page 14]

Internet-Draft               IDNAbis Issues                 October 2006

14.1.  Normative References

   [FC-NFKC]  The Unicode Consortium, "Derived Property:
              FC_NFKC_Closure", June 2006, <http://www.unicode.org/
              Public/UNIDATA/DerivedNormalizationProps.txt>.

   [IDNA200X-BIDI]
              Alvestrand, H. and C. Karp, "An IDNA problem in right-to-
              left scripts", October 2006, <http://www.ietf.org/
              internet-drafts/draft-alvestrand-idna-bidi-00.txt>.

   [IDNA200X-Blocks]
              Faltstrom, P., "??? Permitted Character List for IDNA
              (placeholder)", October 2006,
              <draft-faltstrom-idnabis-tables-00.txt>.

              A version of this document, with color coding to make the
              categories more clear, and supplemental materials, are
              available at http://stupid.domain.name/idnabis/00.html

   [RFC3454]  Hoffman, P. and M. Blanchet, "Preparation of
              Internationalized Strings ("stringprep")", RFC 3454,
              December 2002.

   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
              "Internationalizing Domain Names in Applications (IDNA)",
              RFC 3490, March 2003.

   [RFC3491]  Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
              Profile for Internationalized Domain Names (IDN)",
              RFC 3491, March 2003.

   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
              for Internationalized Domain Names in Applications
              (IDNA)", RFC 3492, March 2003.

   [RFC3743]  Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
              Engineering Team (JET) Guidelines for Internationalized
              Domain Names (IDN) Registration and Administration for
              Chinese, Japanese, and Korean", RFC 3743, April 2004.

   [RFC4290]  Klensin, J., "Suggested Practices for Registration of
              Internationalized Domain Names (IDN)", RFC 4290,
              December 2005.

   [Unicode-USX15]
              The Unicode Consortium, "Unicode Standard Annex #15:
              Unicode Normalization Forms", 2006,

Klensin                  Expires April 19, 2007                [Page 15]

Internet-Draft               IDNAbis Issues                 October 2006

              <http://www.unicode.org/reports/tr15/>.

   [Unicode32]
              The Unicode Consortium, "The Unicode Standard, Version
              3.0", 2000.

              (Reading, MA, Addison-Wesley, 2000.  ISBN 0-201-61633-5).
              Version 3.2 consists of the definition in that book as
              amended by the Unicode Standard Annex #27: Unicode 3.1
              (http://www.unicode.org/reports/tr27/) and by the Unicode
              Standard Annex #28: Unicode 3.2
              (http://www.unicode.org/reports/tr28/).

   [Unicode40]
              The Unicode Consortium, "The Unicode Standard, Version
              4.0", 2003.

   [Unicode50]
              The Unicode Consortium, "The Unicode Standard, Version
              5.0", 2006.

              Forthcoming fourth quarter 2006.  Available online at
              http://www.unicode.org/versions/Unicode5.0.0/

14.2.  Informative References

   [ICANN-Guidelines]
              ICANN, "IDN Implementation Guidelines", 2006,
              <http://www.icann.org/topics/idn/>.

   [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
              Identifiers (IRIs)", RFC 3987, January 2005.

   [RFC4690]  Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and
              Recommendations for Internationalized Domain Names
              (IDNs)", RFC 4690, September 2006.

Klensin                  Expires April 19, 2007                [Page 16]

Internet-Draft               IDNAbis Issues                 October 2006

Author's Address

   John C Klensin (editor)
   1770 Massachusetts Ave, Ste 322
   Cambridge, MA  02140
   USA

   Phone: +1 617 245 1457
   Fax:
   Email: john+ietf at jck.com
   URI:

Klensin                  Expires April 19, 2007                [Page 17]

Internet-Draft               IDNAbis Issues                 October 2006

Full Copyright Statement

   Copyright (C) The Internet Society (2006).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr at ietf.org.

Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).

Klensin                  Expires April 19, 2007                [Page 18]