Draft on IDN Tables in XML

Kim Davies kim.davies at icann.org
Wed Mar 7 07:19:42 CET 2012


Hi James,

On Mar 6, 2012, at 5:47 PM, James Mitchell wrote:

> I think this work should focus on identifying only:
> 
> 1) The set of code points that can be used for registration
> 2) The set of code points (or sequences of code points) that are considered equivalent by the registry
> 
> The table should not attempt to place rules on the use of code points within a label as these rules are often non-trivial. One can easily tell whether a name is registered by performing a DNS lookup or a WHOIS query for the name. Alternatively a registrar will be able to notify a potential registrant should a name be considered "invalid".

I'm not sure I understand what you are asking this to rule out. The design goals state the format is not designed to restrict registry policy, rather act as a method of expressing what it is so others can re-use it as they see fit. I don't see the use case where it could be conceived this would be used in place of a DNS or WHOIS lookup. An IDN table confers nothing about what labels are already allocated in a registry.

The most value this format can bring is if it can express as many rulesets are possible in relation to IDN policies. If there is a substantial population of IDN tables that can not be expressed with it, I am not sure it is any more beneficial than the current situation.

> Further to the above the table should not attempt to define those variants that are activated/allowed/blocked. An active variant can be determined from a query to the DNS or WHOIS and these protocols will have to used considering a variant may have been activated post-registration. Additionally the rules for determining whether a variant can be activated are non-trivial. Consider the example below.
> 
> <char cp="0627">
>     <var cp="0625"/>
> </char>
> 
> And a registered name of "0627 0627". It is unclear from the definition above whether the label "0627 0625" is valid because it does not describe whether the substitution should have been applied across the whole label or whether it can be applied to one character. This is only a trivial example however I can provide many more complex rules.

I think we're in agreement on this. With the above table and the "0627 0627" string, presumanly it would generate a set of 3 variants: ("0627 0625", "0625 0627", "0625 0625"). Now what the registry does with those variants is the registry's business. As we've seen with the JET guidelines, different registries have taken the same base table but resulted in different approaches to which labels are delegated, reserved or otherwise handled.

That said, a suggestion has already been made to me that a registry could optionally specify an attribute as to whether a variant would result in blocking, delegation or something else. This doesn't mean a consumer of the table needs to follow that hint if they wish to repurpose the table.

> To avoid the somewhat common mistake of incorrectly defining equivalence I suggest that equivalent sequences of code points are defined in one place. For example
> 
> <char cp="0627">
>     <var cp="0625"/>
> </char>
> <char cp="0625">
> 	<!-- whoops, forgot to identify 0627 as an equivalent character -->
> </char>
> 
> should be expressed as
> 
> <equivalent>
> 	<char cp="0625">
> 	<char cp="0627">
> </equivalent>

What about one-way variants? It seems kind of clumsy to have them specified in potentially two different duplicative ways.  I am not sure how common it has been that registries mess up their tables, but you could probably easily lint your table to pick up where equivalence doesn't exist, and fix it if appropriate.

kim


More information about the Idna-update mailing list