IDNABIS Working Group completes its work

Mark Davis ☕ mark at macchiato.com
Fri Mar 19 02:32:35 CET 2010


I discussed your comments with the Unicode editorial board, and the
following is based on that discussion.

Mark

We'd like to thank you for your review of the document, it is quite helpful.
The comments are very constructive, and some point out ways in which we can
improve the document both from an IETF perspective, and absolutely. Below
are some initial responses and questions; we'd appreciate hearing back from
you where you still have questions, or think that the approach is still not
sufficiently clear.

Note that we have just (yesterday morning) updated the proposed draft with
the results of the March 11 editorial meeting, which should also improve
some of the text substantially. That draft does not yet, however, have any
changes incorporated as a result of your review. Note that there is a
renumbering of sections, since the old section 6 becomes part of 4.

http://www.unicode.org/reports/tr46/


*Items 1, 1.1, 1.2. Specification and tables*

The Unicode Consortium uses explicit listings of derived tables as a means
of ensuring backwards compatibility between versions of the Unicode
Standard. The current table for UTS #46 is based on Unicode 5.2, and will be
updated for Unicode 6.0, Unicode 6.1, and so forth. Its derivation is
consistent with the definition of the algorithm for the table for IDNA2008,
with the obvious difference that it also deliberately incorporates the
required mappings (casemapping, width folding, etc.) and the modifications
necessary for transitional compatibility with IDNA2003, which are the point
of UTS #46 processing.

We have found that explicit listings of a derived table is much more
reliable for implementations than depending on individual engineers
correctly implementing the complicated derivation of the table, on top of
the already complicated processing that makes use of the table. This, by the
way, is the same kind of methodology which applies to *all* of the derived
data tables for the Unicode Character Database, so this follows data
maintenance practices which are widely understood among Unicode
implementers.

There is no intent here of tying UTS #46 processing to a specific version of
the Unicode Standard, or to "turn things back to the situation with
IDNA2003", which was tied specifically to Unicode 3.2.

*Items 2, 2.1, 2.2. Normativity*

The normative part of the document is driven from the conformance clauses,
which specify how the processing functions. The only implications for
applications are that if they choose to use the processing, in order to be
conformant they need to replicate the results. That is typically how the
Unicode specifications of algorithms are structured.

Could you let us know where you find the language confusing? We can then
make sure that it is clearer (but also see below).

*Item 2.3. Taxonomy*

We are not sure about your exact concern here, but if the issue is whether
UTS #46 is using terms such as "A-label" and "U-label" according to the
definitions in IDNA2008, the intent is that these terms be used exactly as
defined in IDNA2008. This can be clarified in the text by explicit reference
to the IDNA2008 definitions.

*Items 3, 3.1. Content.*

UTS #46 does two different things. It supplies a mapping, which can be used
as preprocessing for IDNA2008. However, it also allows (transitionally)
characters that are not valid in IDNA2008. Thus, as a whole, it cannot be
considered a preprocessing for IDNA2008. This is summarized in the middle of
Section 2:

Summarized briefly, UTS #46 builds upon IDNA2008 in three areas:

Mapping. The UTS #46 mapping is used to maintain maximal compatibility and
meet user expectations. It is conformant to IDNA2008, which allows for
mapping input.

Symbols and Punctuation. UTS #46 supports processing of symbols and
punctuation during the transitional period. The transition will be smooth:
as registries move to IDNA2008 the DNS lookups of IDNs with symbols will
simply be refused. At that point, in practice, there is full compatibility
with IDNA2008.

Deviations. UTS #46 provides two different ways of handling these to support
a transition. Transitional Processing should only be used immediately before
a DNS lookup in the circumstances where the registry doesn't guarantee a
strategy of bundling or blocking. In all other cases, the Nontransitional
Processing, which is fully compatible with IDNA2008, should be used.


*Item 3.2 URI/URL/IRI, Section 1*

There is nothing normative in Section 1; it is intended only as informative
background about the problems facing implementers of IDNA2008.

We will clarify the relationship between domain names and the IRIs used in
examples.

*Item 3.3. table vs derivation*

We will recast the text of the now Section 6, to make it clear that the
mapping table derivation has *nothing* to do with the mapping of labels in
Section 4. It is a metaoperation outside the context of Section 4, which
defines the derivation of the table which is then used for mapping in
Section 4.

Note the mapping table is constructed so that it can apply to the entire
domain name in one step. This matches what is done in practice in IE and
other environments. The table does have some modifications to ensure that
that works properly, such as exclusion of characters like U+2488 ( ⒈ ) DIGIT
ONE FULL STOP (see the last line in §7).

*Item 3.4, Security Considerations*

This was already changed in the last editorial phase, for the very reasons
you cite. The title is now "Transition Considerations", and contains
modified text.

*Item 3.5. "a<ZWJ>b"*

True, we will make that example clearer, by providing context.

*Item 3.6. Unicode characters and encodings, escaping*

The entire document is structured as applying to only Unicode text; if a
string is not already in Unicode, then it needs to be converted. We will a
sentence about that. However, all of the processing is in terms of code
points, so whether the text is in UTF-8, UTF-16, or UTF-32 is immaterial for
the specification. It can be applied equally well to any of them.

The paragraph about escaping is not normative, it is simply informative. We
will make that clear or move that text.

Note that the processing is structured so that it handles a "mixed" domain
name in the URL, like
"xn--fltstrm-5wa1o.Fältström.com<http://xn--fltstrm-5wa1o.xn--fltstrm-5wa1o.com>",
and validates the punyicode label.

*Item 3.7, 3.8 Table construction*

The characters listed in step 3 were not hand-picked: they were derived,
that is "The exclusion set consists of characters that have a different
mapping in IDNA2003 than the base mapping value specified in Step 1, or that
are disallowed in IDNA2003.". What we will do is make the derivation steps
for this section more explicit.

We'll add text explaining how the mapping alone can be used as a
preprocessing of IDNA2008, where the transtional characters are not needed.

*Item 4, 4.1, 4.2. Structure, comparison*
*
*
We'll fix the issue with the escaping, which appears to be where a majority
of the confusion may arise, and show how the mapping can be used alone. The
exact differences between the IDNA2003, IDNA2008, and the UTS 46 table are
pointed out in the table at the end, with pointers to an online comparison.
They can also be compared explicitly with a Perl script. So we think this is
sufficient.

*Item 4.3. Handling IRIs in the browser environment*
*
*
Such material is outside of the scope of the document for the current
release. We agree that that kind of material would be very useful to collect
and present, either in a future version of the document or in another
document.

====


2010/3/16 Patrik Fältström <patrik at frobbit.se>

> On 16 mar 2010, at 10.40, Vint Cerf wrote:
>
> > Reference to TR46 seemed to be desired by the AD
> > but as you see from Patrik Faltsrom's detailed review of that draft
> > document, there is much to discuss and debate.
>
> Vint, I do not think everyone have seen the review. I did a review during
> the past weekend on request from Unicode Consortium and the IAB.
>
> But, of course this is nothing secret. You can find the review on my blog:
>
> http://stupid.domain.name/node/955
>
> Note though that this is MY personal review, written without any hat on.
>
> And, there are disclaimers that I might have misunderstood things in the
> document -- which imply the only thing needed is that the document should be
> clarified so I do not misunderstand the next time I read it :-)
>
>   Patrik
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20100318/9ca2e4bb/attachment-0001.htm 


More information about the Idna-update mailing list