goal and structure of draft-klensin-idnabis-issues

Mark Davis mark.davis at icu-project.org
Sun Feb 4 00:38:14 CET 2007


I agree with Erik. The structure of the document makes it hard to
disentangle the perceived problems from the proposed solutions, and many of
the perceived problems don't have enough justification for how severe those
problems really are. One should be able to read the document, and be able to
then say: "There are 10 problems outlined in the document, and for each
problem these are the proposed solutions, or a conclusion that a solution is
infeasible or best handled with a different mechanism". That is difficult to
do currently.

I'd strongly recommend a more systematic structure, as something like:

   - Background: A brief overview of what IDNA is, and how it is used. No
   interwoven perceived problems.
   - Problems: What are the perceived problems with IDNA. No interwoven
   propose solutions.
   - Solutions: Possible solutions for each of the perceived problems, or
   conclusions that the cost/benefit analysis does not support making a change
   for the problem.
   - eg, that the perceived problem is not that serious, or that there
      does not seem to be a feasible solution in IDNAbis such as in "8.
      The Ligature and Digraph Problem"


Background
For the background, I'd recommend the following. Much of this is already
there, but spread around.

   1. Quick discussion of what is permitted and what isn't, with examples
   2. The registration scenario
   3. A usage scenario (eg typing or copying text in the address bar of a
   browser)

Perceived problems
For the perceived problems, at a top level I see at least the following, of
which the first two are core issues.

   1. IDNA disallows labels that some people want. Examples:
   1. Unicode 3.2 characters: Catalan "xarel·lo", Farsi <Noon, Alef,
      Meem, Heh, ZWNJ, Alef, Farsi Yeh>,..., trailing combining marks in BIDI
      fields.
      2. Unicode 4.0+ characters: words with Oriya WA or VA, Bengali
      KHANDA TA, Tamil SHA, Ethiopic HOA, Balinese characters (should
get specific
      example),...
      2. IDNA allows labels that some people don't want. There are two
   main categories:
   1. Security issues: "amazon.com/bogus.com", where the "/" is a
      fraction slash, "paypal.com" with Cyrillic 'a',...
      2. Not Security issues, but "not necessary": eg
I♥NY.museum<http://i%e2%99%a5ny.museum/>
   3. The accepted input set is much larger than the output set, and
   should be restricted (example: "fishing.com" with an "fi" ligature).*
   (Needs a discussion of why this is a problem.)
   4. The process of upgrading to new versions of Unicode is slow, thus
   disadvantaging minority scripts
   5. User's expectations are not met when an IDN that seems perfectly
   reasonable, like bäcker.com <http://b%c3%a4cker.com/>, don't either
   display or work in some applications.*
   6. The definition of IDNA2003 as an explicit algorithm, expressed
   partially in prose and partially in pseudocode, rather than more traditional
   IETF practice where the functions are specified, rather than algorithm.*
   7. Character conversion may cause a problem if the characters of the
   local character set do not map exactly and unambiguously onto Unicode
   characters.*
   8. Display ordering can be different than network (logical) order.*
   9. Some sequences are considered equivalent in one language, but not
   in another (æ vs ae vs ä)*

For user expectations, we should have a table of examples for such cases,
such as:

URL IE 7
 Firefox 2.x
 Opera 9.1
  www.þorn.is <http://www.%c3%beorn.is/> www.þorn.is<http://www.%c3%beorn.is/>
www.þorn.is <http://www.%c3%beorn.is/> www.þorn.is<http://www.%c3%beorn.is/>
bäcker.com <http://b%c3%a4cker.com/> bäcker.com/ <http://xn--bcker-gra.com/>
xn--bcker-gra.com/  bäcker.com/ <http://xn--bcker-gra.com/>
путин.museum<http://%d0%bf%d1%83%d1%82%d0%b8%d0%bd.museum/>
путин.museum <http://%d0%bf%d1%83%d1%82%d0%b8%d0%bd.museum/>
путин.museum<http://%d0%bf%d1%83%d1%82%d0%b8%d0%bd.museum/>
 путин.museum <http://%d0%bf%d1%83%d1%82%d0%b8%d0%bd.museum/>
I♥NY.museum<http://i%e2%99%a5ny.museum/>
xn--iny-zx5a.museum/ <http://i%E2%99%A5ny.museum/>
i♥ny.museum/<http://i%e2%99%a5ny.museum/>
 i♥ny.museum/ <http://i%e2%99%a5ny.museum/>
pаypal.museum<http://p%d0%b0ypal.museum/>
xn--pypal-4ve.museum/ <http://p%D0%B0ypal.museum/>
pаypal.museum/<http://p%d0%b0ypal.museum/>
 pаypal.museum/ <http://p%d0%b0ypal.museum/>
ibm.com⁄foo.museum<http://ibm.com%e2%81%84foo.museum/>
ibm.xn--comfoo-rq0c.museum ibm.xn--comfoo-rq0c.museum/ Illegal
I drew the * problems from
http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issues-00.txt. I
may have missed some, since they are interwoven in the text.

In all cases, I think they need motivating examples. For example, for the
claim that "Character conversion may cause a problem if the characters of
the local character set do not map exactly and unambiguously onto Unicode
characters." one should be able to cite at least one charset where at least
one character conversion causes a problem for IDNA2003. Either that, or
remove it as a problem.

Such examples are extremely helpful for having a concrete set of cases,
rather than abstractions that are difficult to assess -- and more
importantly, difficult to say whether a particular solution addresses the
particular examples cited.

In some cases there are already such extensive examples and discussion in
issues-00. For example, the display order being different than network
order, or the ligature ae issue. In other cases they are missing. I
understand that you can't address everything at this point, so indicating
where missing material is TBD is certainly reasonable.

Possible Solutions

Here we can have an examination of the priorities of the above problems, and
proposed solutions.

For each problem it should discuss the feasibility/cost of handling the
problem in IDNAbis, versus the feasibility/cost of dealing with it at a
different level (such as in a browser or in the registry). This is already
done in many cases in issues-00. For example, the discussion of "8. The
Ligature and Digraph Problem" does that, concluding that the problem cannot
be feasibly handled in IDNAbis, and that it is more appropriate to handle in
registries. But in many cases the discussion is lacking, and restructuring
would make it clear where there are holes.

Mark

On 2/2/07, John C Klensin <klensin at jck.com> wrote:
>
>
>
> --On Friday, 02 February, 2007 14:29 -0800 Erik van der Poel
> <erikv at google.com> wrote:
>
> > John,
> >
> > As I read your idnabis-issues Internet Draft, I found that it
> > was
> > difficult to decide what types of comments to make, since I
> > didn't
> > know what the goal of the draft was:
> >
> > http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issu
> > es-00.txt
>
>
> > Is this draft intended to be the 1st of a number of drafts
> > that will
> > eventually lead to an RFC about the issues and some of the
> > proposals?
> > Or will they lead to an "IDNA Overview" RFC, similar to RFC
> > 3490?
>
> it is intended to see if we can agree on what we are talking
> about and what is to be done, then to be split up into at least
> two pieces, one of which replaces 3490 and the other of which
> provides background on the decisions.  Since some of what is in
> it has impact on nameprep and stringprep or their successors,
> and the closely-related draft-alvestrand-idnabis-bidi has a
> different set of impacts, "at least" is important and other
> organizations are possible.
>
> > Also, it might be easier to make concrete comments if the
> > draft was
> > divided into sections corresponding to:
> >
> > (1) issues
> > (2) solutions (or principles for solutions)
>
> Watch for -01, probably next week.
>
> > There may be some need for a section (0) on background. Also,
> > section
> > (2) could be put in a separate Internet Draft?
>
> That is the plan, but not until there is some greater consensus
> about where we are going to end up.  See above.
>
> > To sum up, I believe it would be easier to comment on the
> > draft if
> > there were a clear distinction between perceived issues and
> > proposed solutions.
>
> You have company, i.e., you are not the first to make that
> comment.  I work --and think and analyze proposals--
> differently.   I can't make any claim that my way is better,
> just different.  If there are more people like you than there
> are people like me, the split will occur sooner rather than
> later (although probably not for -01).
>
> regards,
>    john
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070203/fa6ce393/attachment-0001.html


More information about the Idna-update mailing list