Issues lists and the "preprocessing" topic

John C Klensin klensin at jck.com
Tue Aug 19 02:48:04 CEST 2008


Hi.

I'm finishing cross-checking and updating the Protocol and
Rationale documents, but, in the interim and in the hope of
keeping things moving forward, 

(1) I'm attaching updated version of the "outstanding issues"
documents circulated before the IETF meetings.  These reflect
decisions made there.  Those who disagree, especially if you
have something new to say), are _strongly_ encouraged to make
your case on the mailing list.  If we have silence, I'm going to
ask Vint to declare many of these issues closed (which ones
should be obvious from the lists).   I've also reorganized the
lists by status, i.e., separating those things that I believe
are still in need of discussion from those that I believe are
finished or nearly so.  Opinions may differ about those
categories but, if so, I hope that those who disagree will speak
up very soon.


(2) I've been working on the "preprocessing" and mapping issue
in an attempt to reflect where we stand in the documents.   It
is unlikely that the next versions of the drafts will have this
completely right, but I want to try to return to principles and
see if we agree (or not) about them.   If we do agree, we can
then have discussion about tuning the text to best reflect those
principles.  If we do not, then I believe that it would be more
effective to discuss those disagreements about principles rather
than quibbling about specific text.

I believe that:

	(a) Our target is to have any IDN that moves across the
	network contain non-ASCII labels in either U-label or
	A-label form (i.e., that no mapping should be required).
	In addition, IDNs in protocol contexts, including HTML
	"href"s, should be in A-label form (i.e., be URIs, not
	IRIs). We aren't going to completely accomplish either
	of those goals for the reasons below, but they are still
	desirable targets.
	
	(b) Both long-term and short-term, systems that actually
	read and manipulate strings typed by users are going to
	need more flexibility than ones that process files.
	Such flexibility may include some operations that we
	talked about during the IDNA2003 development period but
	that, as far as I know, have not been implemented on any
	significant scale.  For example, my current preferred
	email client distinguishes between "copy link location"
	and "copy email address", so that, given
	    <a href="mailto:foo at example.com">Joe Blow</a>
	Copy link location would yield "mailto:foo at example.com"
	Copy email address would yield "foo at example.com"
	and a possible "copy" would yield "Joe Blow", 
	each in the relevant copy buffer ("clipboard").
	
	One can imagine similar copy operations applied to IDNs
	or IRIs that would yield domain names containing
	A-labels or URIs, respectively.
	
	Obviously, the typical user would have no clue about the
	differences among these operations initially.  But,
	faced with situations in which "copy and paste" just
	doesn't work, strings that cannot be displayed when
	passed to a colleague or even a different application on
	the same system without display problems (e.g., rows of
	question marks or boxes or characters drawn from some
	other CCS), and given decent user interfaces, they would
	learn quickly. 
	
	(c) Compatibility with IDNA2003 will require mapping of
	stored strings in some contexts.  Ideally, those
	mappings should be strictly confined to characters
	mapped by IDNA2003 and interfaces to them should be
	designed to encourage migration to no-mapping forms.
	Some types of applications, such as indexing ones, might
	need to preserve these types of mapping much longer than
	others.  At the other extreme, web browsers might be
	configured to warn before mapping, or even to reject
	domain names that require them, unless the user was
	clearly referencing older pages.

The implications of the above are that we not only aren't
encouraging extensive local-option mapping, we are encouraging
no mapping at all except for backward compatibility when
necessary and as a user interface convenience.   For the latter,
the expectation is that one will make the mappings as early as
possible and use only the mapped (U-label or A-label) form in
files; storing anything else in a file or sending it across the
network is strongly discouraged.   Also, even when mappings are
done, the rule that is now present in the documents still
stands, i.e., one must not map a PVALID or CONTEXT character
into anything else -- mapping is permitted only for DISALLOWED
characters.

So, do others agree with that and, if not, where are the
disagreements and why?

    john
-------------- next part --------------
Issues list, IDNABIS Protocol  (as of 20080807)

This revision has been reorganized to group issues into three
groups: those that need additional discussion to resolve, those
on which comments have not been received from other than the
originators and that should be closed if no further comments
are received, and those that are believed to be settled after
IETF 72 in Dublin and that will be closed unless there are
protests on the mailing list against the meeting decisions.

Note that comments from those who originated the issue and that
repeat their initial remarks are not helpful for any of these
categories -- additional perspectives would be much more
useful.

Section numbers referenced are the same in both 
draft-ietf-idnabis-protocol-01 through -03 except as noted.
"Critical Path" slide numbers refer to the slides used during
IETF 72 and are followed by the titles of those slides.

There are several issues that, because of text moving back and
forth and suggestions for additional moves, are in the issues
list and summary for Rationale.  They are, in general, not
repeated here. 

Note that, unlike Rationale and some independent issues, no
specific "outstanding issues" list was posted for Protocol in
May.  That was due, in part, to the underwhelming response to
the other postings.  So this document incorporates what would
have been that list.


  -------------
Discussion and Resolution Still Required
  -------------



** P.11 **  Placeholder: Description of steps in both lookup and
registration

	These two sections still need work.

	Status: Discussion anchor in text (anchor2 in -02).
	Not worth doing much with until some related issues are
	sorted out (e.g., whether these should be recombined, see
	above). Specific suggestions welcome.

    Status: No change at IETF 72


** P.12 **  Placeholder: Preprocessing

	There is a lengthy placeholder in the text about
	preprocessing issues.  Even if the WG doesn't take on the
	task of standardizing preprocessing, there is a great deal
	of controversy as to whether it is necessary, should be
	required, and should be standardized down to the last
	mapping or if our goal is actually to change the processing
	model, not just the descriptive one.

    Comment: this needs to be resolved in the WG before the
	text in Protocol can be made internally consistent and
	consistent with Rationale.  Aspects of topic have been
	discussed extensively in postings in recent days, including
	the Issues/ Status report for Rationale-01.

    Critical Path slide 3, "Rationale 2".

    Status: Text being rewritten after discussions at IETF 72.
	Please watch for it and comment.


 --------------------
Comments needed or issues will be considered settled.  Some
items have been placed in this category because new text in
protocol-03 or protocol-04, suggested by on-list or in-Dublin
discussions,is believed to have resolved the issue. 
 --------------------

** P.6 ** Requirement for Policy

   Mark writes about Section 4.4: "While exact policies are not
   specified as part of IDNA2008 and it is expected that
   different registries may specify different policies, there
   SHOULD be policies." This SHOULD is pointless, unless some
   constraints or guidance are given. Otherwise my policy could
   be "any valid IDNA label", which would be precisely the same
   as no policy at all.

   See also R.9 and R.27.

   Comment: I fully expect that some zone administrators will
   adopt exactly that sort of policy.  We've said that policy
   decisions, and consistent application of those decisions, by
   zone administrators are an important part of the
   registration model.  We have said that specifying those
   policies and determining their adequacy is not an IETF
   matter, but rather a matter for governments, enterprise
   management, ICANN, and so on.   We have seen application
   implementers evaluate per-zone policies and respond with
   decisions about what to display.  So, I don't think this is
   pointless.  The problem is whether different language would
   better describe the handoff.

   Of course, were we to decide that our audience is purely
   protocol implementers, all of this would go away (i.e., this
   is linked to the "reorganize documents and remove rationale"
   issue of R.1).

   Critical Path slide 3, "Rationale 2".

   Status: New text coming for the next versions of the
   documents after more discussion in Dublin (IETF 72).
   Please watch for it and comment.


** P.7 **  Universality of Unicode

   Section 5.2 says: "The local character set, character
   coding conventions, and, as necessary, display and
   presentation conventions, are converted to Unicode (without
   surrogates), paralleling the process described above in
   Section 4.2." 

   Mark writes: In the vast majority of cases in modern
   software, the local charset IS Unicode, so this may be
   confusing. Also, UTF-16 does and must use surrogate code
   units, so this needs to be more precise. And excluding
   surrogate code points isn't necessary since gc=Cs are
   forbidden anyway. Suggest:
   "The string is converted from the local character set into
   Unicode, if it is not already Unicode. The exact nature of
   this conversion is beyond the scope of this document, but
   may involve normalization, as described in Section 4.2."

   Comment: I don't know how to evaluate "vast majority...",
   but I keep running across examples and discussions that
   suggest that the presumed minority is fairly large.  The
   most recent example is a lengthy discussion about
   interoperability problems with email text body parts;
   problems that would presumably be infrequent or trivial if
   most systems primarily supported Unicode.

   Status: The specific textual change suggested has been
   made and the virtual timer has run out.   Anyone who doesn't
   like this should object immediately, otherwise I'll ask Vint
   to declare the change final.


** P.8 **  Validation of A-labels

   Section 5.4 says: "In general, that conversion and testing
   should be performed if the domain name will later be
   presented to the user in native character form (this
   requires that the lookup application be IDNA-aware)."

   Mark writes: Suppose that program X creates an A-Label from
   a U-Label, then sends that A-Label to program Y, which sends
   it to program Z, which sends it to program W, which displays
   it.  It sounds like each of Y, Z, W need to validate. Is
   that the intent of this text? If it is only W that needs to
   validate, then it gets a bit murky in today's world, where
   the boundaries between cooperating processes and programs
   are very fuzzy.

   Comment: That "murky" situation is exactly why the text
   leaves a lot of judgment in the hands of the application.
   Note that the sentence after the one you quoted now says, in
   part, "others may treat the string as opaque to avoid the
   additional processing at the expense of providing less
   protection and information to users", which is intended to
   be a very clear statement that there is a trade off
   involved.  I believe that, if you read the surrounding text,
   you will find that the specific answer to your question is
   that it is important that W should (note lower case)
   validate unless it has some reason not to (note that it has
   to go to most of the work to validate in order to display
   and that knowing, as a consequence of system design and
   knowledge of adequate checking and auditing, that it
   wouldn't have received the string unless it was valid).
   Validation is optional, and less likely, for Y and Z, but
   their implementers may certainly do so if they are being
   cautious.

   That text is consistent, I think, with on-list discussions
   that seem to have concluded that we must be very careful
   about not imposing IDNA requirements on programs that are
   not IDNA-aware but that there are some serious spoofing, 
   abuse, and malware opportunities if programs simply assume
   the validity of strings that appear to be A-labels.

   Critical Path slide 4, "Protocol 1".

   Status: Suggestions for clearer text would be welcome; no
   substantive change between -01 and -02.  Status unchanged
   during IETF 72.


** P.10 **  Similarity of registration and resolution procedures

   Mark points out that the steps in 5.5 are all the same as
   in 4.3 -- except for bidi. This fact should be very clear
   in the text. 

   See also P.15

   Comment: Of course, they were more different until the MAYBE
   categories were removed.  There still seems to be merit in
   describing them separately because implementers of
   registration procedures and actions and implementers of
   lookup ones tend to be different even if there is some
   shared code.    A comment could be inserted nothing the
   parallelism (sic), but I'm not quite sure how or where to do
   that without encouraging people to get sloppy about the
   differences that do exist.

   Specific suggestions and discussion would be welcome.
   Unchanged at IETF 72.



** P.14 ** (Editorial)  Text about A-labels on registration

    Should this text be moved from 4.1 to 4.3?

    Status: See placeholder in draft.  No discussion through
	IETF 72.  People with strong opinions should express them
	soon.



** P.15 ** Reducing duplication in "registration" and "lookup"

    The description of steps in Sections 4 and 5 are very
	similar.  There are suggestions that the two be recombined
	(see Mark's notes from some time ago) and another one that
	we create a new section with the common material and point
	to it from the two existing (but shortened) sections (see
	Marcos's notes on Protocol).

    Comment: Keeping the two sections separate contributes,
	IMO, to the goal of making it easier for people to find the
	information they need to implement things correctly, so I'm
	nervous about recombining the sections.   Marcos's
	suggestion seems like a middle ground, although it will
	result in more page-flipping.   

    Status: We need some discussion, of which none has occurred
	through IETF 72	(other than comments from those who
	originally made the suggestions).  At least rough consensus
	on this is needed if it is going to be changed.
	Otherwise, the default is more or less status quo. 


** P.16 **  Versions and the Conceptual Rules

    Marcos suggested that the Conceptual Rules Registry and the
	Derived Properties one have a formal version structure.  I
	may or may not understand the suggestion the way he
	intends, but, if I do, this is an Issue.

	Comment: The IETF has had poor success with version numbers
	in tables and the like, especially those that are intended
	for future compatibility, with the notorious "MIME-Version:
	1.0" header standing out as an example.  So I'm reluctant
	to this unless we have a clear understanding of how we (and
	implementations) would use those versions, what error
	conditions we would expect and to whom they would be
	reported, etc.   Much more discussion and/or text needed.

    That said, asking IANA to keep a "last modified" date in
	the registry in some easily-processed form would seem both
	reasonable and, at worst, harmless.

    Critical Path slide 4, "Protocol 1".

    Status: No discussion during IETF 72.  If others don't
	express opinions, this will be dropped or turned into a
	suggestion for a "last modified date".  In any event, it
	is now an issue for "tables".


  -------------------

Resolved in Dublin (IETF 72) or otherwise settled

These items are final and the issues closed unless arguments
are raised on the list that reverse the apparent Dublin
consensus.

  -------------------



** P.1 ** Location of the Contextual Rules table

    This table is the Appendices to Protocol at present; in
    previous versions it was in Rationale.  Based on list
    discussions, it should probably be moved to Tables,
	probably along with some additional material that is
	still in Rationale. 

    Status: Resolved in Dublin (IETF 72); text will be moved.
	The relevant text has been sent to Patrik. 


** P.2 **   Format of the Contextual Rules table.

    At present (Protocol-02), that table consists of a condensed
	format that more or less matches the format used for
	standard entries in Tables.  With it, important fields are
	separated by semocolons, potentially followed by comments
	that start with "#".  As an example, the first entry in the
	table looks like the following in Protocol-01:

	   002D; HYPHEN-MINUS; F;
		  Must not appear at the beginning or end of a label;
		  Regular expression:
		  [^^]\u002D|\u00SD[^$] ;
		  # Note that a prohibition on having two hyphens as
		  the third and fourth characters of anything but a
	      valid A-label appears in the specification.

    Mark Davis suggested a different format.  He wrote (I have not
	preserved his formatting):

		I suggest that the table be formatted for clarity to not
		depend on whitespace -- using names for each field -- and be
		broken into a list of condition/result pairs.

		Code point: 200C
		Name:       ZERO WIDTH NON-JOINER
		Lookup:     True

		# Allow ZWNJ for breaking cursive connection, as needed in
		Farsi. 
		Before:     [[:Joining_Type=Dual_Joining:]
              [:Joining_Type=Left_Joining:]]
              [:Joining_Type=Transparent:]*
		After:      [:Joining_Type=Transparent:]*
		      [[:Joining_Type=Dual_Joining:]
              [:Joining_Type=Right_Joining:]]
		Value:      PVALID

	Comment: There is no dependency on whitespace, at least in
	Protocol-01.  Mark's comments may have reflected an earlier
	version.   I've illustrated a version of these change with
	the alternate appendix (see elsewhere), but I'm not sure
	about it for two reasons.  The first is whether people
	prefer a more compact format or one that uses more vertical white
	space.  We hope that the Contextual Rules registry will remain
	small, but ending up with a significantly longer Tables
	document (remember that this section is normative) as the
	result of format may not be in our best interest.   The
	second point is related to the next issue.

    Critical Path slide 4, "Protocol 1".

    Status: Resolved in Dublin (IETF 72), in favor of a
	"rules" approach rather than a regex one.  Both appendices
	have been removed from Protocol and text has been sent to
	Patrik for inclusion in Tables and modification, as
	needed, into the style of that document.

 
** P.3 **  Definition of the Contextual Rules - Regex or otherwise

    Based on my understanding of what was being asked for, the formal
	definitions of the Contextual Rules were written to use a strict
	regular expression syntax with one regular expression per rule.
	That syntax, illustrated by the example above for U+002D,
	is not easy to look at or understand, but would lend itself
	to an automatic rule interpreter.  Mark's example is not
	one of a single rule, or even formal use of a regular
	expression.  If we are going to go that route, there may be
	even more simple ways to express the rules, leaving
	applications implementers on their own for the
	formalizations to be used in their code.  A second appendix
	has been supplied in -02 as the beginning of a suggestion
	for discussion.

    Please examine the two examples above, the discussion in
	the text, and the forms used in both appendices and advise
	on what you would like to see and why. 

    Note that they are provided in these versions of the
	document for comparison purposes only.  Assuming we can
	agree on which one we want, only one will survive into the
	post-IETF versions of the documents. 

    Critical Path slide 4, "Protocol 1".

    Status: Resolved in Dublin (IETF 72), in favor of a "rules"
    approach rather than a regex one.  Both appendices have
	been removed from Protocol and text has been sent to
	Patrik for inclusion in Tables.


** P.4 ** Protocol reference to Bidi Constraints 

   In 4.3.2.4: the bidi constraints apply to more than just
   single labels. 

   Comment: Noted.  The question of those Bidi constraints is
   probably one of the larger and more substantive open issues
   we face.

   Status: Resolved in Dublin (IETF 72) after some very strong
   assertions from the Security and DNS folks about the
   implausibility of cross-label checking.  Since this appears
   to be a showstopper for them that would certainly result in
   blocking DISCUSS positions, I think the topic is dead in
   the IDNABIS WG -- anyone who wants to argue it should
   probably take it up in another arena.   Text in Protocol
   and Rationale is being conformed. 


** P.5 ** Bidi-checking Requirement on Lookup

   Should this be a SHOULD or a MUST?  

   Comment: See the discussion anchor and text in Section 5.5
   (note that this anchor has been in the document for some
   time and there have been no comments on-list).

   Critical Path slide 5, "Protocol 2".

   Status:  Still no comments through IETF 72.  The anchor has
   been removed and this is considered done.


** P.9 ** Use of "in parallel"

   Section 5.5 and elsewhere use the term "in parallel" to
   describe the relationship between two (or more) sets of
   steps or procedures.  Mark expresses concern that this will
   create confusion with concurrent operations, which is not
   intended.  He suggests other wording. 

   Comment: Specific suggestions for alternate text would be
   welcome.

   Status: The term "parallel" has been removed from both
   Protocol and Rationale.   This is an editorial matter and we
   are therefore done with it unless someone objects RSN.


** P.13 **  Labels starting in combining marks

    In Section 5.5 (lookup validation and testing), the text
	contains a prohibition on labels starting with combining
	marks. I think we have consensus on the prohibition.  Is
	the statement of it adequate?

    Status: There is a discussion anchor in the text.  It will
    be removed if there is no discussion on this in the near
	future.  No discussion through IETF 72 and no comments on
	list.  Anchor removed.  Done.
-------------- next part --------------
Issues list, IDNABIS Rationale  (as of 20080807)

This revision has been reorganized to group issues into three
groups: those that need additional discussion to resolve, those
on which comments have not been received from other than the
originators and that should be closed if no further comments
are received, and those that are believed to be settled after
IETF 72 in Dublin and that will be closed unless there are
protests on the mailing list against the meeting decisions.

Note that comments from those who originated the issue and that
repeat their initial remarks are not helpful for any of these
categories -- additional perspectives would be much more
useful.

Section numbers refer to draft-ietf-idnabis-rationale-00
through -02.  "Critical Path" slide numbers refer to the slides
used during IETF 72 and are followed by the titles of those
slides.

This is more or less a status summary and issue discussion for
that document in the hope of facilitating wider discussion.  I
will further update the list as needed and create a list
without comments for discussion and tentative decisions before
Dublin.  The numbers given are to facilitate referencing.

The issues/comments from R.10 through R.25 inclusive onward are
derived from messages from Mark Davis, primary the note sent
Monday, 07 July, 2008 08:05 -0700.  Consequently and just as a
convenience, in places the text reads as a reply directed to
him.


-------------
Discussion and Resolution Still Required
-------------

** R.1 ** Should this document exist at all?

	There is an ongoing discussion about whether the documents
	should be reorganized in part to move material that is
	necessary to the protocol implementation entirely out of
	this document and then to drop this document.

    Critical Path slide 2, "Rationale 1"

	Status: Not really discussed during IETF 72; awaiting
	instructions from Vint and/or a meaningful discussion.
	The rest of this list assumes the answer is "no";
	otherwise it would be irrelevant. 


** R.2 ** Normative text in this document.

    If the document is retained, should we try to remove all
	normative references from the other documents to it and/or
	remove all normative text from it and put it in the other
	documents?  (Those two definitions of the issue probably
	amount to the same thing.)

    Comment: This issue is discussed at some length in my note
	on document organization.

    Critical Path slide 2, "Rationale 1"    

    Status: No substantive discussion during IETF 72.
	Discussion needed after R.1 is resolved.



** R.3 ** PROTOCOL-VALID Explanation

    Section 6.1.1 contains an overview of Protocol-Valid.  Is
    that explanation adequate?  If more is needed, what?

    In addition, there is an inconsistency between Rationale
	and Tables about the relationship between the two CONTEXT
	categories and PROTOCOL-VALID.  Tables treats the two
	categories as disjoint. The explanation in Rationale
	treats them as special subset cases of PROTOCOL-VALID,
	more or less as PROTOCOL-VALID with special flags set.

    Comment: The approach used in Tables is computationally
    easier and reflects the structure of both documents before
	a tentative decision in January to try the subset approach.
	My impression from discussions of these revisions in
	various meetings around the world is that the Tables
	(disjoint) approach is more generally understood.  The two
	document obviously should be consistent; unless there are
	strong enough reasons to use the subset approach to justify
	changing Tables, Rationale will be changed to match it in
	the first post-IETF version.
	
    Status: The overview text about PROTOCOL-VALID has been
	changed to something that may be more clear.  Comments or
	suggestions for improvements on that issue or the more
	general ones discussed above are welcome.  No progress
	during IETF 72.  This section probably needs to be
	harmonized with "Tables".


** R.7 ** Explanation of removal of symbols

    Section 10.5 discusses the reasons why symbols are not
    permitted in IDNA2008.   There has been controversy about
    some of the statements and examples, with some disagreement
    about whether some of them are even factually correct.   We
    either need to identify everything that is controversial out
    of this section and trim it out (which might leave very
    little), fix specific examples on which we can agree, accept
    (and possibly note) the disagreements, or adopt some other
    strategy.    Of course, if we drop rationale and explanatory
    material in favor of a "this is just how it is" approach,
    the specific issues with this section vanish.

    See also R.25, which focuses on a specific part of the
	explanation.  There was no substantive discussion of the
	issue at IETF 72 and this issue is still open.


** R.5 ** Permanence of DISALLOWED

    There has been extensive on-list discussion about whether
    migration of characters from DISALLOWED to PROTOCOL-VALID
    (or CONTEXTO) should be easy, or at least easier than
    migration from PROTOCOL-VALID to DISALLOWED.   I do not
    believe we have reached consensus even though the material
    in 6.1.2 reflects what I believe to have been the general
    trend of the discussions when they ran down.  Does anyone
    have anything new to say about this and, if not, should I
    remove the placeholder?

    See also R.11

	Status:  Issue raised at IETF 72 in Dublin, but no
	conclusion reached.  Placeholder remains in -02.
	Discussion (not just repetition) needed.


** R.16 ** Description of major changes from IDNA2003, invalid
       labels 

  Should item 11 ("Make some currently-valid labels that are
  not actually IDNA labels invalid.") be dropped?

  Mark said: 'Why do we care that labels invalid under IDNA2003
  are also invalid under IDNA2008? Why wouldn't they be?
  Perhaps an example would help to clarify this.'

  Comment: That isn't what it says, actually.  I think the
  intent of the statement was to call out the fact that
  conforming IDNA2003 implementations can look up labels
  starting in "xn--" that are not valid A-labels and labels
  that contain "--" in the third and forth positions that don't
  start in "xn".   In IDNA2008, both of those lookup operations
  are somewhat discouraged and conforming applications MAY
  decline to look them up.  (N.B., those two reasons for not
  looking something up are separate and part of the
  outstanding issues list for Protocol.)

  Status: No substantive discussion at IETF 72.  Awaiting
  comments.



** R.20 ** Display of A-labels (punycode-coded strings) to users

   The text says 'Applications MAY allow the display and
   user input of A-labels, but are encouraged to not do so
   except as an interface for special purposes, possibly for
   debugging, or to cope with display limitations.'

   Mark writes: There is widespread use of the A-Label to
   signal a possible spoof -- while you discuss that later, I
   think it's swimming against the tide not to mention it here.
   
   Comment: We definitely need to talk about this.  There is a
   difference between recognizing that something is done, even
   on a "widespread" basis, and encouraging it.  From a
   security practices and human factors standpoint, switching
   into A-labels for too many different reasons is a bad idea
   under both the principle that excessive warnings cause
   typical users to ignore all of them after a while and 
   because the user has no way to differentiate among the cases
   (at least without a handy A-label -> Unicode code point
   list mapper in addition to an A-label -> U-label mapper and
   knowledge as to what to do with both).  Consider the two
   most popular causes of A-label display today -- failure to
   have the relevant script installed (much more likely an
   indicator of "you are unlikely to be able to read the
   content at that destination" than of evil-doing unless one
   makes the classic assumption that anyone you don't know is a
   nasty barbarian) and failure to be part of a TLD that
   practices approved registration hygiene (extremely prone to
   false positives unless the TLD is actively recruiting
   evildoers) -- and then think about the effectiveness of
   A-labels as a spoof warning in environments in which we know
   that the typical user's response to being barraged by "an
   <incomprehensible> thing might happen if you continue, do
   you want to continue" messages is to click "yes" every time.
   Your original suggestion (a half-dozen years ago) to color
   these things was actually much better.

   I don't think there is a place for any of that discussion or
   associated recommendations in the document, but I also don't
   think we should be saying things that constitute
   recommending the practice.   The text as it stands is just a
   MAY with a fairly weak "encouraged" clause so people who
   believe that A-label display is the best solution are still
   conforming.

   Whatever is said about display of A-labels, it seems clear
   that users should be able to type them, in some way, on
   input.  Does that need to be said explicitly?

   Status: It is now (-02) said explicitly, although the text
   is easily removed if people conclude that it is clutter.
   This topic was not discussed at IETF 72 and I await further
   comments.
 


** R.24 ** Local (user interface) mappings

   The text says: "None of those local decisions are a threat
   to interoperability as long as (i) only U-labels and
   A-labels are used in interchange with systems outside the
   local environment,...".

   Mark writes: Doesn't really follow that there are no
   problems. The obvious example of interoperability problems
   are where a Turkish friend has a URL that works in his
   browser, copies the text in an email and sends to me. When
   I click on it, it either 404's or **much worse**, goes to a
   different website.

   Comment: First of all, Mark and others have suggested many
   times that the Turkish "i" is such a rare and unique case
   that there is no point worrying about it.  I'm not convinced
   of that, but I think we either need to take it seriously as
   a problem...  or not.

   More to the point, the key point is in the text following
   that which you quoted, i.e., "(ii) no character that would
   be valid in a U-label as itself is mapped to something
   else,...".  As long as the dotless "i" (U+0131) remains
   PVALID (it is today), then the case I assume you are calling
   out above (since you specifically mentioned a Turkish
   friend) is prohibited (which doesn't mean it won't happen --
   see below).

   But I think this case identifies the sources of our
   disagreement about required standardized mapping and a
   number of other issues (including the one about the [mis]use
   of A-label display discussed above). If I'm making the right
   inference from several of your comments, you tend to look at
   whatever is going on today and say "this is the pattern, it
   is in wide use, we need to accept it, standardize it, and
   promote it".  My view is that, in a world in which we are
   talking about planning for the next billion Internet users
   and the billion after that, we should be figuring out what
   is optimal, including learning from the consequences of
   previous behaviors and decisions, and then developing plans
   about how to get there (even if that implies a
   slightly-bumpy transition).

   In this particular case, I look at some of the possibilities
   that could lead one to end up with a different target host
   or site than intended and say "another reason why we need to
   restrict URLs to final (non-mapped) characters only and get
   people out of the habit of believing that they can use, or
   advertise, one character in a URL or email address and
   expect a different character in the DNS".   You look at
   figures about how often Google encounters characters that
   need mapping and say "we have to preserve the mappings
   forever" while I look at the same data and say "we have to
   think through the transition and legacy issues very
   carefully, but need to get that behavior under control
   before it gets even worse".

   Suggestions about better text would be welcome, but the
   group needs to figure out what it wants to do with the
   underlying difference in models between "make the Internet
   better by stabilizing current behavior patterns" and "make
   the Internet better by facilitating evolution to
   better-designed patterns".

   See note from Mark titled "Interoperability" and my
   response. 

   Critical Path slide 3, "Rationale 2"

   Status: Placeholder/discussion anchor inserted in text.  The
   text itself has been rewritten in -02 somewhat to reflect 
   IETF 72 discussions.   I believe we now have consensus on
   the principles although probably still not on the text.
   Suggestions welcome.


** R.25 ** Explanation of the symbol prohibition

   In Section 10.5, one of the bullet points starts "Most
   Unicode names for letters are, in most cases, fairly
   intuitive, unambiguous and recognizable to users of the
   relevant script....and there are far more squares of
   various flavors in Unicode than there are hearts or stars."

   Mark wrote: This just needs to be removed; the argumentation
   is faulty. For the same pronunciation, Chinese has hundreds
   of possible characters. If you want another reason (and
   someone to point a finger at), you could say: "The Unicode
   Standard recommends that these types of identifiers not
   contain symbols [UAX31].

   Comment: This needs further discussion.  I believe that the
   comment about Chinese is totally irrelevant: after all, a
   large number of languages have multiple characters within
   the same script for the same phoneme (as well as using the
   same character to represent multiple phonemes) and none of
   that generally affects the name of the character.  We may
   need to agree to disagree and then let the WG decide whether
   the discussion is important enough to try to tune in the
   light of the disagreement.  I don't know if better phrasing
   would help things; I gather from the above that you are
   convinced that it would not.

   Status: A placeholder/discussion anchor has been inserted
   into the text.  This topic needs further discussion from
   additional parties.


 --------------------
Comments needed or issues will be considered settled.  Some
items have been placed in this category because new text in
rationale-02, suggested by on-list or in-Dublin discussions, is
believed to have resolved the issue.
 --------------------



** R.4 ** Contextual rules and their application

    Section 6.1.1.2 contains a discussion of contextual rules
    and a placeholder for an in-depth explanation of the syntax
    for those rules.   It may not be right and/or clear.

    See P.1, P.2, and, especially P.3.

    Status:  Contextual rule sections have been removed from
	Protocol for placement in Tables.  This section will be
	changed to conform.  Please watch for it and comment if
	needed.


** R.9 ** Scope of requirements on Registries

	The language of the first sentence of Section 10.1.2
	appears to make a requirement on all DNS names and servers.
	That would require an update to RFC 1123, among other
	things.  Such an update is out of scope and probably
	undesirable.

	Comment: That reading was never intended.  The language
	assumed more context than the reader might have had and was
	generally sloppy.  There is a more general issue about
	policy requirements; See R.27 which remains an open issue
	due to new text.

	Status: This text was changed in the -01 document to
	restrict its scope to zones supporting IDNA and 
	eliminate the restriction on other prefixes. 
	WG members should review both the change and the note in
	the text of -01 to verify that they are what is desired.
	There have been no such comments on-list or during IETF 72,
	so we will soon decide that we are done.



** R.15 ** Description of major changes from IDNA2003, Bidi

  Mark suggested removing item 9 ("Make bidirectional domain
  names in a paragraph display in a non-surprising fashion.")
  because it is just a special case of the previous item.

  Comment: that text wasn't mine and I'd like to hear from Paul
  and/or Harald and Cary before making any changes.  I believe
  the intent of the two separate statements was to distinguish
  between the case in which one knows that the string is a
  domain and the case in which one needs to deduce that the
  string is a domain name from running text.

  Status: A placeholder comment has been inserted in the -01
  document to identify this issue.  In the -02 document, item 9
  has been tentatively removed and item 8 rewritten a bit to
  make the relevant distinctions.  People should check that new
  text carefully to be sure it reflects their intent.



** R.17 ** Safe, but only in conjunction...

   Mark wrote:   '"that are safe for use only in
   conjunction". Since you never say why they are unsafe, this
   needs clarification. Do you mean this because of visible
   confusability?'

   Comment: The reference was in conjunction with combining
   characters that have represent much the same situation as
   the joiners, i.e., that they don't add decoration to the
   previous character but, instead, just change its
   presentation form.  An example of this is Arabic Tatweel
   (U+0640), which has been discussed extensively in comments
   by individuals and from ASIWG.  I don't know how many other
   examples there are.  In the case of Tatweel, the ASIWG
   recommendation has been to ban it entirely (i.e., treat is a
   DISALLOWED).  If we follow that advice, it may not be a
   real example, but perhaps it illustrates the point.

   It is worth noting that, as long as the approval process for
   changes remains IETF Review, it is sufficient for this
   document to use a relaxed definition of "safe", to be
   evaluated on a case-by-case basis, rather than trying to
   narrow things down to definitions that would be automatic.

   Text that would further clarify this would be welcome.

   Status: No comments or suggested text received.  This will
   be dropped as an issue unless there is a discussion RSN.



** R.18 ** DISALLOWED in error.

   Mark commented on the statement 'If a character is
   classified as "DISALLOWED" in error and the error is
   sufficiently problematic, the only recourse would be either
   to introduce a new code point into Unicode and classify it as
   "PROTOCOL-VALID..."'. Unless you have some evidence to
   think that this is a real possibility (I don't), it should
   be removed.' 

   Comment: I don't feel that this is worth fighting wars, or
   even writing long explanations (again), over. If there is
   consensus that it should come out, it will come out.
   However, I think we should all understand that most of
   these kinds of "unless you have evidence", "unless you
   can prove this could happen", and "unless you can show an
   example" arguments can produce different conclusions
   depending on how they are stated.   For example, the issue
   with the above could be stated (with apologies for being
   obnoxious) "Unless there is evidence that the Unicode
   Consortium has never made, and will never make, a serious
   mistake, then the text should stay in".  Again, I'm not
   arguing for keeping the text, but I do think we need to make
   decisions without getting trapped by the phrasing of
   questions.

   Status: No further discussion.  Unless some, from other than
   Mark or myself, appears soon, this will be considered
   settled.


** R.23 ** Detecting domain names in text

   Section 9, Paragraph 2 of the text contains the statement
   "If a domain name appears in an arbitrary context (such as
   running text), one may be faced with the requirement to know
   that a string is a domain name in order to adjust for the
   different forms of dots but also to have traditional dots to
   recognize that a string is a domain name -- an obvious
   contradiction."

   Mark wrote: "Not a contradiction, remove. Example, if one
   recognizes full-width dot in detecting URLs, then one can
   clearly use them in parsing within labels."

   Comment: Either I don't understand your example, or it
   supports my point.  If one "recognizes full-width dot in
   detecting URLs", then one has already made a decision that
   goes outside IDNA and the URL standards.  One could as
   easily "recognize" any other character, whether dot-like or
   not and whether on the IDNA2003 "treat as dot" list or not.
   That recognition would occur in one of two cases:

   (i) One treats the dot-like character as equivalent to an
   ASCII dot throughout the application.  The domain name is
   then recognized, in essence, by recognizing the ASCII dot as
   usual.  That might well work for a full-width character in
   contexts that are used to treating all full-width
   Latin-derived characters as identical to their ASCII
   equivalents, but certainly would not apply to various other
   characters that people have insisted are dot-like enough (or
   sentence-separator-like enough) to be treated as label
   separators.

   (ii) One treats the dot-like character as an ASCII dot
   because one knows that one is in a domain name or URL
   context.  But therein lies the contradiction to which I
   referred: if you need to know the context in order to
   determine whether the dot-oid is to be treated as an ASCII
   dot and/or label separator, then you cannot use the
   dot-oid to determine the context.

   Obviously the explanation in the text is not good enough.

   Status: New text was inserted in -01, along with a
   placeholder note.  Further rewriting was done in -02 to
   reflect the input from other areas during IETF 72.
   Comments and suggestions welcome. 


** R.27 ** The role of policy vis-a-vis this spec.

   This specification (and Protocol) stress that registry
   policies are an important element of a working, complete,
   IDN environment.   We don't specify any policies, even
   minimal ones, and "our policy is 'open season on users'" is
   a possible response.   Is that plausible or do we need to do
   something else?   And, if something else, what?

   See R.9 and P.6.

   Critical Path slide 3, "Rationale 2"

   Status: New text coming for the next versions of the
   documents after more discussion in Dublin (IETF 72).
   Please watch for it and comment.


** R.28 ** Transitions for applications taking advantage of
   IDNA2003 mappings

   There are a number of web pages in the wild in which
   characters mapped out by IDNA2003 (e.g., "Mathematical"
   forms, subscript and superscript digits) are used,
   presumably in the interest of a distinctive presentation.
   Rationale does not discuss that issue nor offer specific
   advice about transitions.   Should it?

   Status: New text in -02.  Please review and comment.



  -------------------

Resolved in Dublin (IETF 72) or otherwise settled

These items are final and the issues closed unless arguments
are raised on the list that reverse the apparent Dublin
consensus.

  -------------------


** R.6 ** User agents and warnings

    The last paragraph in Section 6 (actually 6.3 on layered
    restrictions) explicitly points out the role of user agents,
    and then concludes with a warning against threats that
    cannot be completely prevented or blocked.   That sentence
    is redundant with a disclaimer in Security Considerations.
    Should it be removed (I believe that someone explicitly
    asked that something along these lines be said in 6.3, but
    it is redundant, so I'm checking).  Silence will be
    interpreted as "leave the text, drop the placeholder".

    Status: No comments before or during IETF 72 in Dublin.
	Placeholder has been removed and this issue is believed to
	be settled.


** R.8 ** Mechanisms for updating the context registry

    Section 13.2 ("IDNA Context Registry") contains a discussion
    of the updating rules for the Contextual Rules registry.
	Are those rules the ones we want?

    Comment: Pursuant to discussion on-list, this section was
	rewritten in -01 to require IETF review and approval.  As
	discussed on the list, I (and at least some others) still
	hope that we can eventually get to a process that is more
	based on expert review, but it appears to be well in the
	future. 

    Critical Path slide 2, "Rationale 1"

    Status: This is believed to have been settled at IETF 72 in
	Dublin and there have been no comments on the list about
	the -01 changes.  Most of the material in 13.2 (part of
	IANA Consideration) has been removed to the Tables document
	and a pointer inserted.  


** R.10 ** Stability of Labels.

   Mark Davis wrote: I believe quite strongly that once a
   domain name is valid, it should not be invalidated by any
   later version of IDNA.  Now, while we cannot prevent a later
   RFC from doing that, we *can* prevent such invalidation by
   the normal process of updating tables under these RFCs for
   new versions, adding exceptions, and changing contextual
   rules.

   Comment: I've included this in the Rationale list to be sure
   that it doesn't get lost.  I believe that there is consensus
   on this point.  If that is the case, the only issue is
   identifying any places where the documents might be
   inconsistent with it.  If there are no additional comments
   on this subject through the end of IETF, I propose to
   interpret that as agreement.

   Status: No further comments; interpreted as agreement.
   Done.


** R.11 ** Instability of Nonlabels.

   Mark wrote:
   I also think that making nonlabels stable should *not *be a
   goal. It can't really be achieved anyway, since the presence
   of an UNALLOWED character can make a label be invalid in
   version X yet valid in version Y (where that character is
   defined).

   Comment: See R.10 and R.5 above.  I assume that "UNALLOWED"
   was a typo for "UNASSIGNED", not "DISALLOWED" or something
   else.  I've included this in the Rationale list to be sure
   that it doesn't get lost.  I believe that there is consensus
   on this point although perhaps not as clearly so as with
   R.10, but that it is worth making a distinction between
   UNASSIGNED and DISALLOWED in terms of stability (see R.5).
   If that is the case, the only issue is identifying any
   places where the documents might be inconsistent with it.

   On the other hand, we may just have a misunderstanding.  See
   the second "Description of major changes..." list, below.
   
   Status: Some hall discussions (not in the WG) during IETF
   72 lead me to believe that we have agreement that strings
   that do not qualify as labels cannot be guaranteed to be
   preserved in that state when the reason is because they
   contain UNASSIGNED characters.  Rationale explicity says
   that strings that are invalid because they contain CONTEXT
   characters but fail the rule texts cannot be guaranteed to
   not change state.   However, the real issue is whether a
   character, once DISALLOWED, stays DISALLOWED (see R.5).
   Unless there is dissent from that analysis I propose to drop
   this issue and leave R.5.


** R.12 ** Management.

   (Again from Mark's note and again in this list to prevent
   its getting lost) 
   The process of adding backwards compatibility characters,
   context conditions, and exceptions needs to be much more
   definitive. 

   Comment: I think the discussions during and since IETF 71
   have focused on "change the RFCs using normal IETF
   processes".  -01 has been changed to reflect that (see R.8
   for more discussion on the specific topic of Contextual
   Rules).  If we believe that, I don't know how much more can
   be said at this stage.

   Status: There has been extensive on-list discussion,
   confirmed by discussion in Dublin.  This is believed to
   have been settled at IETF 72 and, subject to the usual
   qualifications, is done.


** R.13 ** Statement about the reason why IDNA uses Unicode

  The statement in 00 was "strange" and problematic.

  Status: After discussion on the list, Ken Whistler's
  suggested text was substituted into the text of -01.  To the
  extent necessary (I think we reached agreement on the list),
  people should verify that is what is wanted. 

  Status: Silence after -02 is posted will be interpreted as
  consent and this Issue will be dropped.


** R.14 ** General editorial suggestions from Mark.

  Status: all of these that appear editorial and
  uncontroversial and have been incorporated.  Those that were
  not considered editorial and uncontroversial are noted
  elsewhere in this list. 

  People should check diffs and/or Mark's list to be sure they
  are satisfied with the changes.

  Since no comments have been received on any of the changes,
  they are considered done.


** R.19 ** Slightly-redundant text.

   The last paragraph of section 6.3 is redundant with a
   similar comment in Security Considerations.  Should it be
   retained? 

   Comment: Mark suggests "yes" and I'm inclined to agree.  If
   others agree, I'm removing the "anchor" comment.

   Status:  No dissenting comments either on-list or during
   IETF 72.  This is considered Done and the anchor has been
   removed.




** R.21 ** The "ae" example and discussion text.

   In Section 7.3, Paragraph 4, the text reads 'the
   two-character sequence "ae" is usually treated as a fully 
   acceptable alternate orthography.'  Mark suggests adding
   'for the "umlauted a" character'. 

   Comment: This should have been clear from the previous
   paragraph and the fact that the sentence in question starts
   with "That character (U+00E4)", a very explicit
   back-reference to that previous paragraph.  However, the
   suggested change seems relatively harmless.

   Status: The suggested change has been made to the text.
   Anyone who is unhappy about it should say so.  Previous
   experience indicates that the RFC Editor may take exception
   to that many repetitions of "unlauted a", but we will deal
   with that if and when we get there.

   Status: No comments received; this is considered done.


** R.22 *** "cannot be represented directly in domain names"

   In the first paragraph of Section 9, the text "use
   characters that cannot be represented directly in domain
   names but for which interpretations are provided." appears.
   Mark asks: "What is meant by this, and how is it different
   in IDNA2008? In both IDNA2003 and 2008 they are illegal."

   Comment: This was intended to get at the mapping issues,
   both with characters that disappear under NFKC and hence can
   be interpreted to be part of domain names and, in
   particular, to odd cases like the Sharp S (Eszett) mapping
   to "ss".  The situations are definitely different between
   IDNA2003 and 2008.    I think what I was trying to do with
   that convoluted sentence was to identify the situation
   without getting into a discussion of mapping and mapping
   issues.   I obviously failed.

   In -01, a placeholder was inserted in the text, along
   with preliminary suggested text from Patrik.  Please review,
   comment, and suggest improvements or alternatives if
   appropriate.

   Status: No discussion on list or at IETF 72.  This issue is
   considered done.


** R.26 ** Trimming additional text.

   An editorial note appears in the Change Log at the end of
   Section 15.8 about whether a list of additional text
   sections should be rewritten, trimmed, or dropped.  The
   group should review that list.

   Status: Change Log has been trimmed in -02, taking the
   questionable text and note with it.


More information about the Idna-update mailing list