Remider: BIDI inter-label tests in -02

Harald Alvestrand harald at alvestrand.no
Fri Sep 5 22:54:47 CEST 2008


John,

John C Klensin wrote:
> --On Friday, 05 September, 2008 11:15 +0200 Harald Alvestrand
> <harald at alvestrand.no> wrote:
>
>   
>> For all those of you who care about the bidi interlabel issue,
>> the  following text is in -02:
>>
>>    o  The BIDI test MAY return failure if the BIDI rule is not
>> satisfied       by the label following the label that contains
>> AL, AN or R in the       domain name.  For all the reasons
>> given above, it may be       impossible to know the following
>> label, but there seems no or       negative value to allowing
>> the BIDI test to succeed if the       following label is
>> known.  [[POSSIBLY CONTROVERSIAL]]
>>
>> In the example Alireza gave, this would mean that the bidi
>> test is  allowed to fail on the <ALEF>.3.com domain name, but
>> won't fail on the  3.<ALEF>.com domain name - which would at
>> least make sure the document  gives guidance on the decision
>> on which of the two names is going to be  considered "valid"
>> by whatever registry-specific or application-specific  logic
>> people implement to solve the problem elsewhere.
>>
>> (As a personal preference, I would prefer to make it a SHOULD,
>> or even a  "MUST if the following label is known by the BIDI
>> test" - but that did  not seem to be the WG's consensus in
>> Dublin, so that's not what the text  says).
>>
>> This editor needs WG direction on whether to either remove
>> this bullet,  reword this bullet, or remove the [[POSSIBLY
>> CONTROVERSIAL]] tag.
>>     
>
> Harald,
>
> FWIW, let me describe where I think we emerged from Dublin on
> this (some of this fall into the category of what I think of as
> corollaries to the discussions, not the meeting discussions
> themselves).
>
> 	(1) Any requirement for inter-label checking is a
> 	showstopper for DNS reasons and will remain a
> 	showstopper regardless of anything this WG may or may
> 	not wish to conclude.  Put differently, including a
> 	requirement for inter-label checking in a document is
> 	just a way to ensure that the document will be shot down
> 	by the DNS community during Last Call.    Andrew or
> 	others who raised the issue in Dublin might want to
> 	clarify or affirm this, but my impression is any
> 	statement that uses 2119-normative language (other than
> 	MAY) would constitute a requirement in that regard.
>   
I would like those who hold this position to speak up. I don't 
understand that position, and would like to understand it (whether I 
agree with it or not) before giving up on this point.

> 	
> 	(2) URIs do not contain domain names in U-label form.
> 	It is, at best, in poor stylistic taste for them to
> 	contain non-ASCII characters in the domain field using
> 	percent-encoding of U-labels rather than A-labels.
> 	Because there are no manifest RtoL characters in URIs
> 	(because there are no non-ASCII characters), there are
> 	no RtoL-related URI display issues.  
>   
I was trying hard to not mention URIs in *this* message at all, because 
the question of IRIs vs URIs and what-occurs-where is, to my mind, both 
very knotty and deeply irrelevant to the question I am trying to ask, 
which is all about what the lookup process of a domain name with 
U-labels in it is permitted to do. So I'll ignore this issue on this 
particular thread.
> 	
> 	(3) Some referral/indirection URIs constitute an
> 	interesting challenge.  Regardless of what current (and
> 	draft) versions of the URI and IRI may say (or be
> 	construed as saying), the domain-part of a URI is
> 	clearly a "domain name slot" as that term is defined in
> 	IDNA2003 (the definition in IDNA2008 is no different,
> 	but I want to stress that this is not a new decision).
> 	As such, it is expected to contain a U-label or A-label
> 	(IDNA2008 terminology) and not some other encoding (%nn
> 	form for octets or otherwise).  On the other hand, the
> 	tail and its substrings are not inherently domain name
> 	slots.  So, with apologies to Frank and RFC 2606 and the
> 	hope that people will have close enough approximations
> 	to UTF-8 MUAs to be able to get the gist of what is
> 	happening here, if one had an IRI similar to
>
> 	http://www.favorite-search-engine.пример/mumble=fu
> 	bar&q=http://www.пример.com/
> 	
> 	then one would probably expect the last label in
> 	www.favorite-search-engine.пример to be in A-label
> 	form in the URL but to see "пример" in the string
> 	"www.пример.com" to be mapped into a string of
> 	%-encoded octets of the UTF-8 form.   That poses some
> 	interesting problems for the software trying to un-map
> 	the referral that are independent of any RtoL issues,
> 	but possibly identifies just how complicated it can be
> 	to make subtle inter-label tests even within a URL
> 	(remember that, in principle, nothing other than the
> 	host at www.favorite-search-engine.xn--e1afmkfd knows
> 	that "http://www.пример.com/" is an embedded URL
> 	containing an IDN, even though it obviously looks like
> 	one.   As far as anything else is concerned, that latter
> 	string is just running text.
> 	
> 	So one can have the "running text" problem, with or
> 	without RtoL characters, even inside a URL and without
> 	worrying about "paragraphs".
> 	
> 	If this is a problem for IRIs (and whether or not it is
> 	is debatable), it is not a problem for this WG.
>   
Good point, and to my mind a very good example of why we should just 
focus on what happens to domain names when they are displayed as if they 
were plain text.
> Now, while I have never been an advocate of positions like "we
> can't address all of the cases and solve all of the problems,
> therefore we should do nothing", I'm finding that this leads me
> to a position close to Alireza's conclusion (if I understand
> that conclusion correctly).  However, I also see zone policies
> and registration procedures as an important part of the
> protocol.  To me, that means removing all of the normative
> language from the bulleted paragraph above and replacing it with
> some lavish advice that points out the nasty things that can
> happen when naive (or not-so-naive) rendering engines display
> labels containing certain types of characters in certain
> positions next to labels containing certain other types of
> characters.  I think that advice should explain the cases, give
> examples, and (i) indicate that administrators of zones that
> contain RtoL characters in labels or that point into such zones
> (via CNAME, DNAME, and maybe URI-containing NAPTR records) ought
> to be very careful what they do and wish for lest massive user
> confusion and astonishment occur and (ii) that applications
> software that renders these strings in native-character form
> (certainly including URI-> IRI conversion and display programs)
> ought to be very sensitive to these issues as well, perhaps
> contriving to warn users that what they are seeing might not be
> what they might expect to see.  
>   
I think the text you are asking for is already present in the document, 
but would like your suggestions for improving the text.
So I'll conclude that you think I should remove the bulleted point. Is 
that a correct interpretation?
> Much as I'd like to do more, I don't see a path that would
> permit us to do so.
If your conclusion that the bullet I've called out, as it is now 
written, will cause the document to be blocked indefinitely is correct, 
I agree. I would, however, like to have the people who hold that 
position to explain their position.

                         Harald




More information about the Idna-update mailing list