Protocol-08 (and status of Defs-04 and Rationale-06)

Erik van der Poel erikv at google.com
Tue Dec 9 17:24:13 CET 2008


As you know, the 63 octet limit is a DNS restriction, and so it
applies to the A-labels, not the U-labels. Anyway, it would be
difficult to exhaustively test the maximum numbers of Unicode code
points that would fit in 63-octet A-labels. I only used one machine
(even though Google has more than one), and this is what I wrote at
the time:

"I tried removing each of the parts of the rules, and each removal made
the tests fail. So I think we have a minimal set, if not *the* minimal
set.

One of the removals actually required bumping up the total number of
characters from 5 to 6 to see the effect (the zero or more NSMs at the
end rule).

I have now run it with 9 characters total (for "remain grouped") and 7
characters for "no two labels display the same".

In the interests of full disclosure (and to explain how I was able to
run 9 characters on a single machine), I should mention that I did not
loop through every Unicode codepoint. Instead I chose representatives
from L, R, AL, EN, ES, ON, NSM and AN (but not BN), using *different*
representatives for the labels named A, L and D in the spec. This
assumes that the ICU implementation of the bidi algorithm only uses
the bidi properties. I did not turn on the option that does mirroring.
(Parentheses and such are not allowed in IDNs anyway.)"

Erik

On Tue, Dec 9, 2008 at 8:05 AM, Eric Brunner-Williams
<ebw at abenaki.wabanaki.net> wrote:
> I'm concerned that the proposed rules are overly broad, if a label can be 63
> characters, and the claim is that no two of them can have properties A and
> B, where one has property A and one has property B, then is the proposed
> rule true for some positional possibilities, but not all, or true for all.
>
> If all, then the rule is minimal. If only some, then it would be nice, at
> least, to be able to describe, and algorithmically would be really nice, the
> set(s) for which the rule is correct, and the set(s) for which the rule is
> incorrect.
>
> I'm happy to continue this off-list, as my original, and similar query about
> rule 4 didn't get to "what do we know and how do we know it".
>
> Eric
>
> Erik van der Poel wrote:
>>
>> All of the bidi rules, including rule 4, have been tested by Harald
>> and myself. Admittedly, Harald was the one who came up with the rules,
>> but I tested all of the rules, by removing one, running the program,
>> finding the problem, reinserting that rule, removing the next rule,
>> and so on. In other words, not one of the rules can be removed at this
>> point.
>>
>> It might be possible to change the rules and find a different set of
>> rules where not one of them can be removed, but at this point in time,
>> I don't know whether it's worth it. (Unless someone can come up with
>> character sequences that are really needed or highly desirable that
>> fall afoul of the current rules.)
>>
>> Erik
>>
>> On Tue, Dec 9, 2008 at 7:20 AM, Eric Brunner-Williams
>> <ebw at abenaki.wabanaki.net> wrote:
>>
>>>
>>> Thank you. Earlier I asked someone else about rule 4. The response was
>>> not
>>> so informative, could I trouble you to answer the rational question for
>>> that
>>> rule also?
>>>
>>> Eric
>>>
>>> Erik van der Poel wrote:
>>>
>>>>
>>>> Harald and I did exhaustive tests using two different implementations
>>>> of the bidi algorithm (he used his own, I used ICU for C/C++). We
>>>> found that without that rule, you'd get the kind of behavior that we
>>>> don't want. See Label Uniqueness and Character Grouping in:
>>>>
>>>> http://tools.ietf.org/html/draft-ietf-idnabis-bidi-03#section-3
>>>>
>>>> Erik
>>>>
>>>> On Tue, Dec 9, 2008 at 6:37 AM, Eric Brunner-Williams
>>>> <ebw at abenaki.wabanaki.net> wrote:
>>>>
>>>>
>>>>>
>>>>> Harald Alvestrand wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Eric Brunner-Williams wrote:
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>         Alright, that is what has been proposed so far. *But* we
>>>>>>>> now need
>>>>>>>> to take into account Harald's reminder that some combinations
>>>>>>>> are already disallowed separately by the bidi rules on label
>>>>>>>> well-formedness, quite independently of any consideration of
>>>>>>>> CONTEXTO categorization. What the bidi rules require of label
>>>>>>>> formation is:
>>>>>>>>
>>>>>>>> Bidi:     Forbid (d) and (f) [and (g) by corollary]. Allow (e).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Could you point out the lines in bidi you are referring to here?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Section 2 rule 5:
>>>>>>
>>>>>>  5.  If an EN is present, no AN may be present, and vice versa.
>>>>>>
>>>>>>
>>>>>>                    Harald
>>>>>>
>>>>>>
>>>>>
>>>>> Thank you. I thought that was the case. Now where is the rational for
>>>>> the rule?
>>>>>
>>>>> Eric
>>>>> _______________________________________________
>>>>> Idna-update mailing list
>>>>> Idna-update at alvestrand.no
>>>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>>
>


More information about the Idna-update mailing list