Protocol-08 (and status of Defs-04 and Rationale-06)

Eric Brunner-Williams ebw at abenaki.wabanaki.net
Tue Dec 9 18:09:19 CET 2008



Erik van der Poel wrote:
> As you know, the 63 octet limit is a DNS restriction, and so it
> applies to the A-labels, not the U-labels. Anyway, it would be
>   

Old habit. A label containing only two code points not in the 0x00-0xff 
range, one of which has property A, one of which has property B, and 
(the rational for the 63) not proximal to each other, or to the ends of 
the labels where property C "leaks" across label separators (a feature I 
consider a bug), that is the "is the not both A and B in a label" case 
I've tried to articulate. "Two test code points wicked far apart from 
each other and the boundaries to see if there is still coupling" in 
other words.
> difficult to exhaustively test the maximum numbers of Unicode code
> points that would fit in 63-octet A-labels. I only used one machine
> (even though Google has more than one), and this is what I wrote at
>   

Two? (Joke)

> the time:
>
> "I tried removing each of the parts of the rules, and each removal made
> the tests fail. So I think we have a minimal set, if not *the* minimal
> set.
>
> One of the removals actually required bumping up the total number of
> characters from 5 to 6 to see the effect (the zero or more NSMs at the
> end rule).
>
> I have now run it with 9 characters total (for "remain grouped") and 7
> characters for "no two labels display the same".
>
> In the interests of full disclosure (and to explain how I was able to
> run 9 characters on a single machine), I should mention that I did not
> loop through every Unicode codepoint. Instead I chose representatives
> from L, R, AL, EN, ES, ON, NSM and AN (but not BN), using *different*
> representatives for the labels named A, L and D in the spec. This
> assumes that the ICU implementation of the bidi algorithm only uses
> the bidi properties. I did not turn on the option that does mirroring.
> (Parentheses and such are not allowed in IDNs anyway.)"
>   

For the momentary purposes at hand, the 0x00-0xff space (with the 
obvious LDH subset) and the 0x0660..0x06ff space are what interest me in 
the context (a pun, sorry) of directionality.

I really appreciate your helpful responses, this has been a minefield.

Eric
> Erik
>
> On Tue, Dec 9, 2008 at 8:05 AM, Eric Brunner-Williams
> <ebw at abenaki.wabanaki.net> wrote:
>   
>> I'm concerned that the proposed rules are overly broad, if a label can be 63
>> characters, and the claim is that no two of them can have properties A and
>> B, where one has property A and one has property B, then is the proposed
>> rule true for some positional possibilities, but not all, or true for all.
>>
>> If all, then the rule is minimal. If only some, then it would be nice, at
>> least, to be able to describe, and algorithmically would be really nice, the
>> set(s) for which the rule is correct, and the set(s) for which the rule is
>> incorrect.
>>
>> I'm happy to continue this off-list, as my original, and similar query about
>> rule 4 didn't get to "what do we know and how do we know it".
>>
>> Eric
>>
>> Erik van der Poel wrote:
>>     
>>> All of the bidi rules, including rule 4, have been tested by Harald
>>> and myself. Admittedly, Harald was the one who came up with the rules,
>>> but I tested all of the rules, by removing one, running the program,
>>> finding the problem, reinserting that rule, removing the next rule,
>>> and so on. In other words, not one of the rules can be removed at this
>>> point.
>>>
>>> It might be possible to change the rules and find a different set of
>>> rules where not one of them can be removed, but at this point in time,
>>> I don't know whether it's worth it. (Unless someone can come up with
>>> character sequences that are really needed or highly desirable that
>>> fall afoul of the current rules.)
>>>
>>> Erik
>>>
>>> On Tue, Dec 9, 2008 at 7:20 AM, Eric Brunner-Williams
>>> <ebw at abenaki.wabanaki.net> wrote:
>>>
>>>       
>>>> Thank you. Earlier I asked someone else about rule 4. The response was
>>>> not
>>>> so informative, could I trouble you to answer the rational question for
>>>> that
>>>> rule also?
>>>>
>>>> Eric
>>>>
>>>> Erik van der Poel wrote:
>>>>
>>>>         
>>>>> Harald and I did exhaustive tests using two different implementations
>>>>> of the bidi algorithm (he used his own, I used ICU for C/C++). We
>>>>> found that without that rule, you'd get the kind of behavior that we
>>>>> don't want. See Label Uniqueness and Character Grouping in:
>>>>>
>>>>> http://tools.ietf.org/html/draft-ietf-idnabis-bidi-03#section-3
>>>>>
>>>>> Erik
>>>>>
>>>>> On Tue, Dec 9, 2008 at 6:37 AM, Eric Brunner-Williams
>>>>> <ebw at abenaki.wabanaki.net> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Harald Alvestrand wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Eric Brunner-Williams wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>>>         Alright, that is what has been proposed so far. *But* we
>>>>>>>>> now need
>>>>>>>>> to take into account Harald's reminder that some combinations
>>>>>>>>> are already disallowed separately by the bidi rules on label
>>>>>>>>> well-formedness, quite independently of any consideration of
>>>>>>>>> CONTEXTO categorization. What the bidi rules require of label
>>>>>>>>> formation is:
>>>>>>>>>
>>>>>>>>> Bidi:     Forbid (d) and (f) [and (g) by corollary]. Allow (e).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> Could you point out the lines in bidi you are referring to here?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Section 2 rule 5:
>>>>>>>
>>>>>>>  5.  If an EN is present, no AN may be present, and vice versa.
>>>>>>>
>>>>>>>
>>>>>>>                    Harald
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Thank you. I thought that was the case. Now where is the rational for
>>>>>> the rule?
>>>>>>
>>>>>> Eric
>>>>>> _______________________________________________
>>>>>> Idna-update mailing list
>>>>>> Idna-update at alvestrand.no
>>>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>           
>>>
>>>       
>
>
>   


More information about the Idna-update mailing list