Draft on IDN Tables in XML

Wed Mar 7 08:23:42 CET 2012

On 7/03/12 5:19 PM, "Kim Davies" <kim.davies at icann.org> wrote:

>Hi James,
>
>On Mar 6, 2012, at 5:47 PM, James Mitchell wrote:
>
>> I think this work should focus on identifying only:
>> 
>> 1) The set of code points that can be used for registration
>> 2) The set of code points (or sequences of code points) that are
>>considered equivalent by the registry
>> 
>> The table should not attempt to place rules on the use of code points
>>within a label as these rules are often non-trivial. One can easily tell
>>whether a name is registered by performing a DNS lookup or a WHOIS query
>>for the name. Alternatively a registrar will be able to notify a
>>potential registrant should a name be considered "invalid".
>
>I'm not sure I understand what you are asking this to rule out. The
>design goals state the format is not designed to restrict registry
>policy, rather act as a method of expressing what it is so others can
>re-use it as they see fit. I don't see the use case where it could be
>conceived this would be used in place of a DNS or WHOIS lookup. An IDN
>table confers nothing about what labels are already allocated in a
>registry.

I am concerned that consumers of these tables may attempt to use this
information for determining those variants that will be automatically
generated, those that can be activated, etc, where there exist IDN tables
whose rules cannot be expressed using these constructs.

>
>The most value this format can bring is if it can express as many
>rulesets are possible in relation to IDN policies. If there is a
>substantial population of IDN tables that can not be expressed with it, I
>am not sure it is any more beneficial than the current situation.

I cannot see the validity of any arguments I can come up with for trying
to express these rulesets. Perhaps you will understand my point of view
once you read further below.

>
>> Further to the above the table should not attempt to define those
>>variants that are activated/allowed/blocked. An active variant can be
>>determined from a query to the DNS or WHOIS and these protocols will
>>have to used considering a variant may have been activated
>>post-registration. Additionally the rules for determining whether a
>>variant can be activated are non-trivial. Consider the example below.
>> 
>> <char cp="0627">
>>     <var cp="0625"/>
>> </char>
>> 
>> And a registered name of "0627 0627". It is unclear from the definition
>>above whether the label "0627 0625" is valid because it does not
>>describe whether the substitution should have been applied across the
>>whole label or whether it can be applied to one character. This is only
>>a trivial example however I can provide many more complex rules.
>
>I think we're in agreement on this. With the above table and the "0627
>0627" string, presumanly it would generate a set of 3 variants: ("0627
>0625", "0625 0627", "0625 0625"). Now what the registry does with those
>variants is the registry's business. As we've seen with the JET
>guidelines, different registries have taken the same base table but
>resulted in different approaches to which labels are delegated, reserved
>or otherwise handled.

My issue here is that you said 'presumably'. Why create a table that is
ambiguous?

>
>That said, a suggestion has already been made to me that a registry could
>optionally specify an attribute as to whether a variant would result in
>blocking, delegation or something else. This doesn't mean a consumer of
>the table needs to follow that hint if they wish to repurpose the table.
>
>> To avoid the somewhat common mistake of incorrectly defining
>>equivalence I suggest that equivalent sequences of code points are
>>defined in one place. For example
>> 
>> <char cp="0627">
>>     <var cp="0625"/>
>> </char>
>> <char cp="0625">
>> 	<!-- whoops, forgot to identify 0627 as an equivalent character -->
>> </char>
>> 
>> should be expressed as
>> 
>> <equivalent>
>> 	<char cp="0625">
>> 	<char cp="0627">
>> </equivalent>
>
>What about one-way variants? It seems kind of clumsy to have them
>specified in potentially two different duplicative ways.  I am not sure
>how common it has been that registries mess up their tables, but you
>could probably easily lint your table to pick up where equivalence
>doesn't exist, and fix it if appropriate.

There are two distinct concepts here.

In one concept there are the characters that the registry considers
equivalent such that create(character1) will prevent the
create(character2). Equivalent means that one name cannot have another
registrant, or transferred to another registrar independent from the
original domain name. They are an atomic "bundle" of names. In this
concept there are no "one-way" variants. Whether the names in the bundle
are blocked/activated/whatever is irrelevant.

The second concept are the characters that _can_ be substituted from a
character in the original domain name to result in another name. In this
concept there are "one-way" variants, however these either result in a
name that is part of a "bundle" above (because the substituted character
is equivalent to the original character) or a first-class domain name that
can be transferred independently from the original name.

The .tel russian table provides an example where the Cyrillc 'a' and Latin
'a' are considered equivalent (concept 1) however Cyrillic 'a' cannot be
substituted with Latin 'a' to generate a variant name (concept 2).

Mixing these concepts is potentially dangerous as one registry may treat
variants as bundles and another as first-class domain names. I believe
there is great value in the IDN table describing what characters are
equivalent. This will allow consumers of the table, when given two names,
to determine whether or not they are actually part of the same bundle or
potentially separate names with potentially separate registrants. Anything
else that represents rulesets for valid names or activate-able  variants
should be avoided.

>
>Kim

James