Standards and localization (was Dot-mapping)

YAO Jiankang yaojk at cnnic.cn
Sat Dec 8 20:48:35 CET 2007


A very good example.
yes, "http://Bücher.com"  and " http://Buecher.com               // that dot is a full-width dot"
are no big differences. both should be dealt with IDNA before being sent to DNS to look up.

Thanks a lot for your nice example and practice.

YAO Jiankang

  ----- Original Message ----- 
  From: Mark Davis 
  To: John C Klensin 
  Cc: YAO Jiankang ; Yangwoo Ko ; idna-update at alvestrand.no ; fujiwara at jprs.co.jp 
  Sent: Sunday, December 09, 2007 3:28 AM
  Subject: Re: Standards and localization (was Dot-mapping)


  I'm a bit puzzled. If I take a "raw" IDN, like

  http://Bücher.com

  and paste it into an IDNA unaware browser, it won't work. We should expect that of browsers that doesn't handle IDN. We'd need to paste in a punycode version to work: xn--bcher-kva.com

  If I take a "raw" IDN, like

  http://Buecher.com               // that dot is a full-width dot

  and paste it into an IDNA unaware browser, it also won't work. We should also expect that of browsers that doesn't handle IDN. We'd need to paste in a normalized version to work: http://Buecher.com

  That is, it doesn't appear that the dot conversion is much different than the punycode conversion (and case/normalization folding) -- something that has to be done before passing off to DNS for it to work correctly. 

  Mark


  On Dec 8, 2007 5:15 AM, John C Klensin <klensin at jck.com> wrote:



    --On Saturday, 08 December, 2007 12:06 +0800 YAO Jiankang

    <yaojk at cnnic.cn> wrote:

    >> Without that mapping, the string cannot be parsed into labels 
    >> since conventional (legacy) FQDN parsers separate labels
    >> _only_ on ASCII period, 0x2E, aka U+002E.
    >
    > true. non IDNA-aware software  can  not parse  IDN.
    >
    >>
    >> Not being able to parse the string into labels would result in 
    >> rather serious lookup failures,  but the problem is even worse
    >> because:
    >
    > if I understand it correctly,
    > it seems that you have the following assumption:
    > The domain name with the dot of (ideographic full 
    >  stop), U+FF0E (fullwidth full stop), or U+FF61 (halfwidth
    >  ideographic full stop) is not IDN. so this domain will be
    > sent to DNS lookup server without IDNA process. actually,
    > according to RFC3490, it is IDN. 
    > Since it is IDN, it must be dealt with IDNA before being sent
    > to DNS lookup. if that happens, there have not the problem as
    > you said.


    That is not my assumption.  Perhaps I can explain this better by 
    means of an example.   I can't do this exactly, so suppose that
    the character "?" is actually U+3002 (ideographic full stop).

    Someone sends me a URL in email.  The URL consists of

      http://www.xn--0xaat.example.com/

    where the A-label corresponds to the U-label φοο.

    That example uses standard dots.   Suppose I do not have an
    IDNA-aware browser.   But I can take the string from your mail, 
    paste it in, parse it into
     "www", "xn--0xaat", "example", and "com",
    look things up, and obtain the page.   That is how IDNA is
    supposed to work.   As long as the user sticks to passing the 
    ACE form around, applications do not need to be IDNA-aware.

    However, assume that you send me a URL, that looks (substituting
    "?" as above) like:

      http://www.xn--0xaat?example.com/

    I copy that out and paste it into my browser, which we are still
    assuming is not IDNA-aware.  Because the browser is not
    IDNA-aware, the domain name is parsed into

      "www?xn--0xaat", "example" and "com"

    This is obviously wrong and will obviously result in a failure
    to find the name in a query.   Worse, that parsing is performed
    in places and with software other than DNS resolvers.  For 
    example, there are several security-related protocols that use
    DNS names as identifiers but keep them in internal DNS form (a
    list of labels stored with lengths and values, not separated by
    dots).   Depending on how they are designed, even modern 
    implementations are not required to be IDNA-aware (because IDNA
    is transparent).  But the dot-mappings cannot be transparent:
    every system, module, or application that has to parse an FQDN
    into components must know what is, and is not, a 
    label-separation character.

       john






    _______________________________________________
    Idna-update mailing list
    Idna-update at alvestrand.no
    http://www.alvestrand.no/mailman/listinfo/idna-update




  -- 
  Mark 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20071209/30d98b61/attachment.html


More information about the Idna-update mailing list