Comments on IDNA Bidi

Michel Suignard michelsu at windows.microsoft.com
Wed Jan 16 19:53:41 CET 2008


Harald, what you call 'break badly' or 'break apart' is what I perceive as a logical order bidi string being rendered or printed in a visual order which maybe close to impossible to decipher for the mere mortals. Unfortunately that is a common occurrence in bidirectional processing and for example the case of 'part numbers' (in essence a mix of random letters, digits and possibly some symbols) has been frequently used to show how the situation can become quickly hopelessly complicated. At the same time, bidi readers are much more skilled at reading complex bidi strings than we are.

Domain name and by extension IRI are almost as bad as part numbers with the added hindrance that bidi format override can't typically be used.

I don't think that anybody expected at the time of IDNA2003 that with the bidi rules, all domain names using bidi rules would display with an easy to decipher text. Some could be harder than others. In other words, in my opinion 'breakage' is not an objective description of what happens. If a reader used to bidi text can read what someone else would qualified as 'broken' it is in fact acceptable.

Having said that, I would agree that we should try to minimize those cases, but the more complex we make the rules, the more chance that implementers will make mistake.

Finally, it is interesting to note that IRI made the bidi rules optional, because there are occurrences such as URN (or IRN by extension) where the visual representation may not matter as long as the stored value is unique (such as for a name space declaration). See clause 4.2 of RFC3987.

Best regards,

Michel


More information about the Idna-update mailing list