display of RightToLeft chars in localparts and hostnames

Thu Dec 7 22:54:19 CET 2006

On Thu, Dec 07, 2006 at 02:20:49PM -0500, John C Klensin wrote:
> Hi.
> 
> I've had it pointed out to me that I got confused as a
> consequence of very similar discussions on two separate mailing
> lists and replied to this list with a comment that should have
> been directed to the other.   

The "[EAI]" in the subject in this thread might have confused
you. I am sorry for that. :-0 

> 
> In addition to Harald's answer, mapping these out violates two
> principles we have been trying to use in sorting through which
> characters are to be included and how they are handled.  Those
> principles are debatable, but, if they are not clear in
> "issues", I'd appreciate suggestions of specific text to make
> them clear.
> 
> (1) We do as little mapping as possible.  NFC-type mapping is
> unavoidable to make different representations of the same
> character compare equal.  Case mapping is unavoidable to prevent
> astonishment between the way IDNs are handled and the way basic
> ASCII domain name labels are handled.  
> Anything else, especially
> anything that involves either a compatibility mapping, is
> prohibited (i.e., the character that would map to another one is
> prohibited entirely since it would not appear when
> reversed-mapped from the DNS storage form back to a conventional
> Unicode sequence).  

But, compabilibility mappings sometimes map FullWidthChar -> 
HalfWidthChar , both of which have the same basic glyph. 
for example, U+FF21 (FullWidth A) ==> A (U+0041) (NFKC,NFKD)
while NFC,NFD don't.

> And "map to nothing" cases are prohibited
> because they, well, map to nothing and don't reverse-map either.
> Either the characters / code points are nothing, and so we don't
> need them, or they carry information, such as impacting
> presentation, so discarding them is dangerous.

Do you mean that past "map to nothing" candidates are to be
"prohibited" or "allowd" ?

> 
> (2) We try to keep this as simple as possible.  One of the
> frequently-repeated complaints about IDNA(2003) is that no one
> can predict what it will actually permit or not from principles
> -- one must either hand-execute the algorithms or use a computer
> program to do that.  Not a good basis for a standard.  It also
> leads to the belief that IDNA is just too complicated and should
> be replaced by, e.g., UTF-8 in the DNS (which would turn out not
> to be any less complicated, because these rules are about
> character acceptability and mapping, not about the final
> coding/decoding stage)

Right. had-been-proposed UTF8 DNS and IDNA share the "stringprep"ed
unicode string output. Choice between utf8 vs punycode is merely 
an encoding issue wrt backward compatibility for various RFC protocols.

> 
> Now, as we have discussed in the email i18n context, a user
> interface may well have reason to do mappings of various sorts
> prior to getting near IDNA.  That may be sensible and we
> certainly expect it in some cases.  But keeping it out of the
> protocol makes the protocol less complex and makes it much more
> clear what forms of a domain name can be incorporated into URLs
> (and URIs and IRIs generally) and passed between systems.

For example, (RL*) (LR*) peeling off can remain out of protocol.
The main point is that whether we enforce single strict display order 
between (bidi) IDN labels or not *MUST* be addressed clearly 
in new IDNA200x.

(network order) (input time order) (display order) are different
from one another, in the case of  bidiLocal at bidiIDN.bidiIDN.com,
as i posted by previous mails into this list.

Soobok

> 
> Again, if more text to the effect of the above needs to be added
> to "issues" I'd welcome text or at least specific comments.
> 
>     john