URIs/IRIs

Michel SUIGNARD Michel at suignard.com
Thu Feb 26 18:40:05 CET 2009


I would encourage everyone concerned by URI/IRI interaction with IDNA to read [again] the current IRI spec (RFC 3987). The co-authors (Martin Duerst and I) know that we have our work cut once IDNA is updated.
HTML implementations have liberally used IRI as a presentation level over the protocol level (URI), to summarize crudely, while not following exactly the model described in IRI (although many of the IRI statements are recommendations not requirements). There is obviously the added complication that domain names are not necessarily easy to detect in URI/IRI (they can appear in other parts than the custom host name element). For good or bad reasons U-label and A-label can both appear in the IRI/URI (again coupled as presentation/protocol) with necessary mapping between the two.

In this WG though, we should concentrate on getting IDN right, with an eye on consequences on IRIs/URIs so that there is a workable solution when IRIs is updated to take into account IDNA update. IRI has large sections on mapping between URI and IRI which cover IDN as well (when host names are identified as such in the IRI string) and the bidi section borrowed heavily from IDNA-2003 and that will need to be rewritten. Incompatible mapping between the two IDN versions is obviously a great concern, especially because IDNA-2008 does away with it, meaning that IRI now must decide either to create its own or follow the IDNA-2008 model of not mapping (not realistic).

But this is the IDNA WG after all and above anything else I would like to see IDNA-2008 done, so that we can update IRI soon after. To answer Shawn comment directly I'd say we can't legislate where A-labels appear. They can appear anywhere and it is up to the specs to document and prescribe sensible behaviors.

(just talking as one of the IRI co-author, Martin may have a different opinion)

Michel

-----Original Message-----
From: idna-update-bounces at alvestrand.no [mailto:idna-update-bounces at alvestrand.no] On Behalf Of Shawn Steele (???)
Sent: Thursday, February 26, 2009 9:00 AM
To: Erik van der Poel; Vint Cerf
Cc: mark at macchiato.com; patrik at frobbit.se; saleh at nic.ir; idna-update at alvestrand.no
Subject: URIs/IRIs

Ick, I would very much hope that A-labels only would infect the 7 bit DNS system and that % escaping or Unicode would suffice for URIs/IRIs.  I see a restriction to only legal IDN names in the Unicode space, but there's no need for 8 bit aware systems, or those with other existing escaping mechinisms, to get the A-label hack.

- Shawn

________________________________________
From: Erik van der Poel [erikv at google.com]
Sent: Thursday, February 26, 2009 7:32 AM
To: Vint Cerf
Cc: patrik at frobbit.se; saleh at nic.ir; mark at macchiato.com; idna-update at alvestrand.no; Shawn Steele (???)
Subject: Re: Bundling vs Mapping

Hi Vint,

In theory, URL is no longer the term to use, and we should be talking
about URIs (ASCII and %-escaped text) and IRIs (non-ASCII). In
practice, one of the most common contexts for URLs/URIs/IRIs is HTML,
and current implementations accept both URIs and IRIs that contain
non-ASCII text.

In theory, we should be able to introduce characters like Eszett into
IDNA, and use them in A-labels in URIs. In practice, a commonly used
browser version (MSIE7) will not access such URIs, so we'd have to
wait until many users stop using MSIE7 before registrants and HTML
authors could start using those URIs.

In theory, HTML implementations should not have allowed non-ASCII
domain names since IDNA2003 clearly stated that such domain names may
not appear in "IDNA-unaware domain name slots". In practice, the HTML
implementers have ignored that part of IDNA2003 and mapped non-ASCII
domain names, converting them to Punycode. So the Eszett causes a
problem here, because the implementation must decide whether to map to
"ss" or not to map at all, or to try both in DNS.

I don't know whether you'd consider this inimical, but I am certainly concerned.

Erik

On Thu, Feb 26, 2009 at 12:58 AM, Vint Cerf <vint at google.com> wrote:
> Is there any reason the believe that the present idna2008 documents contain anything inimical to URL use of domain names?
>
> ----- Original Message -----
> From: idna-update-bounces at alvestrand.no <idna-update-bounces at alvestrand.no>
> To: Alireza Saleh <saleh at nic.ir>
> Cc: Mark Davis <mark at macchiato.com>; idna-update at alvestrand.no <idna-update at alvestrand.no>; Shawn Steele (???) <Shawn.Steele at microsoft.com>
> Sent: Wed Feb 25 23:29:19 2009
> Subject: Re: Bundling vs Mapping
>
> On 26 feb 2009, at 08.06, Alireza Saleh wrote:
>
>> I think the drafts should talk about domain-names at the DNS point
>> of view to make sure the current DNS infrastructure remains reliable
>> but taking URL confusions in this area may not be an appopriate
>> approach
>
> I of course agree with you on this.
>
> But, I think we need both. I think we need documents that give
> guidelines on how to handle URLs as well, but my point is that it is a
> different cup of tea.
>
> I can envision:
>
> 1. Display of a http URI with username and password in a right to left
> context
> 2. Display of a http URI without username and password in a left to
> right context with one of the labels being right to left
>
> Etc...
>
> Very different things than talking about (just) domain names in a
> generic sense.
>
>    Patrik
>
>> Alireza
>>
>>
>> On Feb 26, 2009, at 1:56 AM, Patrik Fältström <patrik at frobbit.se>
>> wrote:
>>
>>> On 25 feb 2009, at 19.12, Mark Davis wrote:
>>>
>>>>> 4. Why has not work continued on the pre-i-d that Mark worked on,
>>>>> should
>>>>> that work continue?
>>>>
>>>> The indications that we have gotten all along is that that the
>>>> authors of
>>>> IDNA2008 were not interested in that.
>>>
>>> As I feel I am one of the authors of one of the documents of
>>> IDNA2008, I must ask yourself where you found such indications from
>>> me. I have, I thought, quite clearly several times asked you for a
>>> continuation of your draft, and an I-D. What I have said is that I
>>> have wanted the draft not to concentrate so much on HTTP, but
>>> instead choose whether it is about HTTP and URIs, or about domain
>>> names (so that it can be expanded again for the specific
>>> applications that use domain names).
>>>
>>>  Patrik
>>>
>>> _______________________________________________
>>> Idna-update mailing list
>>> Idna-update at alvestrand.no
>>> http://www.alvestrand.no/mailman/listinfo/idna-update
>>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update





More information about the Idna-update mailing list