Draft on IDN Tables in XML
Mark Davis ☕
mark at macchiato.com
Thu Mar 1 20:43:42 CET 2012
A quick note.
A sequence of multiple code points can be specified as a variant of a
single code point. For example, the sequence of "o" then "e" can be
specified as a variant for an "o with umlaut" (U+00F6) as follows:
<var cp="006F 0065"/>
It should be possible for a sequence to map to a character or sequence,
rather than restricting to single code points. So the cp in either case
should allow space-delimited hex codes. Eg (where x and y are code points)
<char cp="x y">
<var cp="y x"/>
I think this would be less prone to errors as simply
<var cp="0673" when="arabic-isolated"/>
The spec needs to have an unambiguous way to determine when a character
satisifies the 'when' clause.
*— Il meglio è l’inimico del bene —*
On Thu, Mar 1, 2012 at 11:15, Kim Davies <kim.davies at icann.org> wrote:
> I have posted a first draft regarding a format that could be used for
> representing IDN Tables in XML to the I-D Repository:
> After discussion with a number of folks that felt this would be good work
> to undertake, I've put together a first cut which is not comprehensive, but
> I think goes some way toward a potential format.
> Unless there is interest in this being a more formal activity, my
> assumption is to aim to publish the final result independently as an
> Informational RFC. However, the mechanism of publication is secondary to
> coming up with something useful that would benefit TLD registries and other
> implementors. A list of design goals, from the document, is as follows:
> • MUST be in a format that can be implemented in a reasonably
> straightforward manner in software;
> • The format SHOULD be able to be checked for formatting errors,
> such that common mistakes can be caught;
> • An IDN Table MUST be able to express the set of valid code points
> that are allowed for registration under a specific zone administrator's
> • MUST be able to express computed alternatives to a given domain
> name based on a one-to-one, or one-to-many relationship. These computed
> alternatives are commonly known as "IDN variants";
> • IDN Variants SHOULD be able to be tagged with specific
> categories, such that the categories can be used to support registry policy
> (such as whether to list the computed variant in the zone, or to merely
> block it from registration);
> • IDN Variants MUST be able to stipulated based on contextual
> information. For example, specific variants may only be applicable when
> they follow another specific code point, or when the code point is
> displayed in a specific presentation form;
> • The data contained within the table MUST be unambiguous, such
> that independent implementations that utilise the contents will arrive at
> the same results;
> • IDN Tables SHOULD be suitable for comparison and re-use, such
> that one could easily compare the contents of two or more to see the
> differences, to merge them, and so on.
> • As many existing IDN Tables are practicable SHOULD be able to be
> migrated to the new format with all applicable logic retained.
> It is explicitly NOT the goal of this format to:
> • Stipulate what code points should be listed in an IDN Table by a
> zone administrator. What registration policies are used for a particular
> zone is outside the scope of this memo.
> • Stipulate what a consumer of an IDN Table must do when they
> determine a particular domain is valid or invalid; or arrive at a set of
> computed IDN variants. IDN Tables are only used to describe rules for
> computing code points, but does not prescribe how registries and other
> parties utilise them.
> I'd appreciate any feedback.
> Idna-update mailing list
> Idna-update at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Idna-update