Draft on IDN Tables in XML

Kim Davies kim.davies at icann.org
Thu Mar 1 23:04:42 CET 2012

Hi Mark,

Thanks for the comments:

On Mar 1, 2012, at 11:43 AM, Mark Davis ☕ wrote:

A sequence of multiple code points can be specified as a variant of a
   single code point.  For example, the sequence of "o" then "e" can be
   specified as a variant for an "o with umlaut" (U+00F6) as follows:

   <char cp="00F6">
     <var cp="006F 0065"/>

It should be possible for a sequence to map to a character or sequence, rather than restricting to single code points. So the cp in either case should allow space-delimited hex codes. Eg (where x and y are code points)

   <char cp="x y">
     <var cp="y x"/>

I agree about the need for this functionality, however given the dual role of the <char> element as specifying both membership in the table for eligibility purposes, and as the target for a specific mapping, I am not sure of the best way to do it.

If we take this approach, the question becomes how does one express the eligibility of "x" and "y" in a table? Can it be assumed both "x" and "y" are eligible in any context, or should it be assumed "x" is only eligible when followed by "y", and therefore you would need to explicitly write the following to indicate eligibility of both as:

<char cp="x"/>
<char cp="y"/>
<char cp="x y">
<var cp="y x"/>

Another alternative would be to completely decouple <char> from <var>, and have var something like <var from-cp="x y" to-cp="y x"/>

<char cp="200c">
     <var cp="0000"/>

I think this would be less prone to errors as simply

<char cp="200c">
     <var cp=""/>

Good idea.

     <var cp="0673" when="arabic-isolated"/>

The spec needs to have an unambiguous way to determine when a character satisifies the 'when' clause.

I struggled with how best to handle Arabic contextual forms and I am not really sure the approach used is the right way to do it. What is in there right now feels awfully ad-hoc. I am wondering if there is some other specification that can be leveraged here, but at the same time not making this specification too unwieldy to implement. Even though there are no IDN tables yet (that I am aware of) that utilise some form of context other than those that use Arabic contextual form, I am sure there is potential for it in the future.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/idna-update/attachments/20120301/9c1d9475/attachment-0001.html>

More information about the Idna-update mailing list