Remider: BIDI inter-label tests in -02

Fri Sep 5 23:09:07 CEST 2008

Thanks John, This is what I tried to say. We know many issues but we 
cannot fix them perfectly  within the protocol documents, but  we can 
describe the issues and advise some possible solutions rather than 
having uncompleted rules in the document.

Alireza

John C Klensin wrote:
> --On Friday, 05 September, 2008 11:15 +0200 Harald Alvestrand
> <harald at alvestrand.no> wrote:
>
>   
>> For all those of you who care about the bidi interlabel issue,
>> the  following text is in -02:
>>
>>    o  The BIDI test MAY return failure if the BIDI rule is not
>> satisfied       by the label following the label that contains
>> AL, AN or R in the       domain name.  For all the reasons
>> given above, it may be       impossible to know the following
>> label, but there seems no or       negative value to allowing
>> the BIDI test to succeed if the       following label is
>> known.  [[POSSIBLY CONTROVERSIAL]]
>>
>> In the example Alireza gave, this would mean that the bidi
>> test is  allowed to fail on the <ALEF>.3.com domain name, but
>> won't fail on the  3.<ALEF>.com domain name - which would at
>> least make sure the document  gives guidance on the decision
>> on which of the two names is going to be  considered "valid"
>> by whatever registry-specific or application-specific  logic
>> people implement to solve the problem elsewhere.
>>
>> (As a personal preference, I would prefer to make it a SHOULD,
>> or even a  "MUST if the following label is known by the BIDI
>> test" - but that did  not seem to be the WG's consensus in
>> Dublin, so that's not what the text  says).
>>
>> This editor needs WG direction on whether to either remove
>> this bullet,  reword this bullet, or remove the [[POSSIBLY
>> CONTROVERSIAL]] tag.
>>     
>
> Harald,
>
> FWIW, let me describe where I think we emerged from Dublin on
> this (some of this fall into the category of what I think of as
> corollaries to the discussions, not the meeting discussions
> themselves).
>
> 	(1) Any requirement for inter-label checking is a
> 	showstopper for DNS reasons and will remain a
> 	showstopper regardless of anything this WG may or may
> 	not wish to conclude.  Put differently, including a
> 	requirement for inter-label checking in a document is
> 	just a way to ensure that the document will be shot down
> 	by the DNS community during Last Call.    Andrew or
> 	others who raised the issue in Dublin might want to
> 	clarify or affirm this, but my impression is any
> 	statement that uses 2119-normative language (other than
> 	MAY) would constitute a requirement in that regard.
> 	
> 	(2) URIs do not contain domain names in U-label form.
> 	It is, at best, in poor stylistic taste for them to
> 	contain non-ASCII characters in the domain field using
> 	percent-encoding of U-labels rather than A-labels.
> 	Because there are no manifest RtoL characters in URIs
> 	(because there are no non-ASCII characters), there are
> 	no RtoL-related URI display issues.  
> 	
> 	(3) Some referral/indirection URIs constitute an
> 	interesting challenge.  Regardless of what current (and
> 	draft) versions of the URI and IRI may say (or be
> 	construed as saying), the domain-part of a URI is
> 	clearly a "domain name slot" as that term is defined in
> 	IDNA2003 (the definition in IDNA2008 is no different,
> 	but I want to stress that this is not a new decision).
> 	As such, it is expected to contain a U-label or A-label
> 	(IDNA2008 terminology) and not some other encoding (%nn
> 	form for octets or otherwise).  On the other hand, the
> 	tail and its substrings are not inherently domain name
> 	slots.  So, with apologies to Frank and RFC 2606 and the
> 	hope that people will have close enough approximations
> 	to UTF-8 MUAs to be able to get the gist of what is
> 	happening here, if one had an IRI similar to
>
> 	http://www.favorite-search-engine.пример/mumble=fu
> 	bar&q=http://www.пример.com/
> 	
> 	then one would probably expect the last label in
> 	www.favorite-search-engine.пример to be in A-label
> 	form in the URL but to see "пример" in the string
> 	"www.пример.com" to be mapped into a string of
> 	%-encoded octets of the UTF-8 form.   That poses some
> 	interesting problems for the software trying to un-map
> 	the referral that are independent of any RtoL issues,
> 	but possibly identifies just how complicated it can be
> 	to make subtle inter-label tests even within a URL
> 	(remember that, in principle, nothing other than the
> 	host at www.favorite-search-engine.xn--e1afmkfd knows
> 	that "http://www.пример.com/" is an embedded URL
> 	containing an IDN, even though it obviously looks like
> 	one.   As far as anything else is concerned, that latter
> 	string is just running text.
> 	
> 	So one can have the "running text" problem, with or
> 	without RtoL characters, even inside a URL and without
> 	worrying about "paragraphs".
> 	
> 	If this is a problem for IRIs (and whether or not it is
> 	is debatable), it is not a problem for this WG.
>
> Now, while I have never been an advocate of positions like "we
> can't address all of the cases and solve all of the problems,
> therefore we should do nothing", I'm finding that this leads me
> to a position close to Alireza's conclusion (if I understand
> that conclusion correctly).  However, I also see zone policies
> and registration procedures as an important part of the
> protocol.  To me, that means removing all of the normative
> language from the bulleted paragraph above and replacing it with
> some lavish advice that points out the nasty things that can
> happen when naive (or not-so-naive) rendering engines display
> labels containing certain types of characters in certain
> positions next to labels containing certain other types of
> characters.  I think that advice should explain the cases, give
> examples, and (i) indicate that administrators of zones that
> contain RtoL characters in labels or that point into such zones
> (via CNAME, DNAME, and maybe URI-containing NAPTR records) ought
> to be very careful what they do and wish for lest massive user
> confusion and astonishment occur and (ii) that applications
> software that renders these strings in native-character form
> (certainly including URI-> IRI conversion and display programs)
> ought to be very sensitive to these issues as well, perhaps
> contriving to warn users that what they are seeing might not be
> what they might expect to see.  
>
> Much as I'd like to do more, I don't see a path that would
> permit us to do so.
>
>      john
>
>
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>