Implementation questions

Mark Davis mark at macchiato.com
Sat Dec 20 20:11:27 CET 2008


*IDNA Implementation Questions**
*In looking at how to implement 2008 (and maintain backward compatibility
with 2003), we are wrestling with some practical questions that we'd
appreciate feedback on. These are not questioning the spec, but rather
trying to see how to apply it to some practical cases we've come up against.
ScenarioLook at the following scenario, where we have three processes that
handle an IRI (perhaps just passing it through), with the final one using it
to access the DNS. (We'll use the term IRI for both when the domain name
labels are in punycode or in Unicode. They aren't necessarily known to be
A-Labels or U-Labels at any given point.)

 P1 => P2 => P3 => P4 => DNS

Variables and Background
These processes may be within the same system, or they may be passing IRIs
across the web (embodied in HTML5 doc, email, XLink, etc.) to other systems
or operating systems. For example, P1 could be a web server hosting a web
page, P2 may be a search engine indexer, P3 could be a search engine results
supply, and P4 could be a browser. Or these could all be cooperating
processes within a search engine indexer.

There are a lot of variables here:

   - Each of P1..P4 could convert an IRI to punycode before sending it on.
   - For that matter, any of them could convert back from punycode to
   Unicode for use internally, or pass that Unicode form on (IRIs with Unicode
   are recommended by the W3C in their protocols).
   - Each of the processes could be doing validity checks to determine
   whether the domain name is valid or not. Such a check may be partial (as in
   the current protocol spec, which doesn't require checking CONTEXT or BIDI),
   or full. (The check for validity is orthogonal to whether the form is
   Unicode or punycode.)
   - Each of the processes may be on IDNA2003, or on IDNA2008, or on some
   hybrid for compatibility.
   - For IDNA2008 implementations, each might be on a different version of
   Unicode.


*Examples:* IE6 only handles punycode, and won't do any validity checking.
IE7 handles both punycode and Unicode. It checks the punycode, so a valid
IDNA2008 IRI with a ZWJ will fail. There are still enough IE6
implementations around that we (and others) need to handle them, and for
years to come there will be IE7 implementations around. Not to speak of
other browsers, emailers, word processors, etc. that handle URL/IRIs based
on IDNA2003.

 Note: even if validity checking is done on an IRI, non-registries don't
need to include the tests for BIDI or CONTEXT, so there is no guarantee that
a punycode form is an A-Label or that a Unicode form is a U-Label.

Questions
1. Suppose that P2 is on Unicode 5.1, and the others are on Unicode 6.0. If
P2 does a validity check, then it could prevent a perfectly valid IRI from
being correctly looked up. To prevent this problem, does that mean that the
best practice is for only P4 to do validity checking? Or should the others
do some weaker form of validity checking, like skipping a check for
UNASSIGNED?

2. Suppose P3 is a non-IDNA aware process, so IRIs should be converted to
Punycode by P2 before sending. Should one do a validity check in P2? How do
we avoid problem #1 in that case?

3. The current protocol spec appears to only require validity checking when
converting to punycode. So when an IRI is already in punycode (which could
have been from IDNA2003 application), it might not  undergo any checking at
all when going from P1 to the DNS; so everything depends on the registry's
doing the right thing. Is it best to check anyway, or does that run into
problem #1?

4. If P2 accepts an IRI in Unicode and passes it on to P3 in Unicode (never
converting to punycode), should it do any validity checking?

5. When a search engine does indexing, it has to map together IRIs that are
"equivalent" (resolving to the same logical location). When it provides an
IRI to the user for a page, that IRI should go to the indexed page. However,
because IDNA2003 and IDNA2008 browsers may go to different places with the
same IRI, which do we provide? If we try to test for which browser the user
has, that is clumsy and error-prone.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081220/4fc0970f/attachment.htm 


More information about the Idna-update mailing list