IDNA comments

Frank Ellermann hmdmhdfmhdjmzdtjmzdtzktdkztdjz at gmail.com
Mon Jul 7 23:47:13 CEST 2008


Mark Davis wrote:

> "The string is converted from the local character set into Unicode,
>  if it is not already Unicode.

I'd strike "if it is not already Unicode", that this conversion can
be a noop isn't the interesting point...

>  The exact nature of this conversion is beyond the scope of this 
>  document, but may involve normalization, as described in Section
>  4.2."

s/may involve/involves/, that is the really interesting part, also
applicable for Unicode input.

> "IDNA uses the Unicode character repertoire, which avoids the
>  significant delays that would be inherent in waiting for a
>  different and specific character sets to be defined for IDN
>  purposes, presumably by some other standards developing 
>  organization. " This is a very strange rationale

It is a summary of RFC 5242, but likely readers don't get this
subtle point.  Maybe it could be trimmed to "because no other
alternative for the IDNA purposes exists".

> "Applications MAY allow the display and user input of A-labels,
>  but are encouraged to not do so except as an interface for
>  special purposes, possibly for debugging, or to cope with
>  display limitations." There is widespread use of the A-Label
>  to signal a possible spoof -- while you discuss that later,
>  I think it's swimming against the tide not to mention it here.

Strong ACK, an application showing me U-labels in any script I
can't read without my prior consent is broken.  s/MAY/MUST/ or
similar.  Sometimes applications have no idea which code points
can be displayed at all, and we should not invent new ?-attacks.

> "the two-character sequence "ae" is usually treated as a fully
> acceptable alternate orthography." Add: "for the "umlauted a"
> character".

s/usually/under certain conditions/, e.g., in US-ASCII RFCs, 
s/a fully accetable/an exceptionally acceptable/  

> Add, to show that we not playing favorites, "Even the very
> common words in English like "can't, and "don't" are not allowed.

...won't work for me, I'm not sure that "don't" is a "word".

Of course English is favoured, pretending that it's not is a
waste of everybody's time.  And it starts to get surreal if
folks say "Roman alphabet" when they mean "US-ASCII letters"
for political correctness, nothing is wrong with "w" and "u".

 Frank



More information about the Idna-update mailing list