Label separators (was: Re: Urdu and SPACE, FULL STOP (Re: comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic))

Sarmad Hussain sarmad.hussain at nu.edu.pk
Sat Feb 23 20:00:23 CET 2008


Dear John Klensin and all,

Thank you for your comments.  I understand and agree.  This is exactly what
I am arguing for as well, i.e. :

"if you need to use a convention locally to permit easier typing of that
character, you can substitute any convenient punctuation (or other
disallowed) character for it... as long as it is mapped to ASCII period
before you store it in a file or transmit it on the wire"
 
However, if IDN standards stop short of providing clear auxiliary
recommendations on WHICH "convenient punctuation" to substitute and HOW
(i.e. map which UNICODE characters onto which ASCII characters),
applications providers like Microsoft, Mozilla, etc., tend to implement
their own interpretation for the browsers.  Unfortunately, many user
communities do not have experience to get their voice to these application
providers. 

So if the standards list these auxiliary recommendations, there is a likely
chance that they will be supported by the application providers as well,
even if language communities are not able to contact them directly.  

In summary, I am not asking that 06D4 be tramitted on the wire.  I am
suggesting that, to ensure that URDU FULL STOP is processed on application
end, relevant IDN standards should explicity recommend that application
providers map 06D4 onto a dot, if they see it in a domain name, before
transmitting it on the wire.



Best regards,
Sarmad   


 
> -----Original Message-----
> From: John C Klensin [mailto:klensin at jck.com]
> Sent: Saturday, February 23, 2008 10:15 PM
> To: Sarmad Hussain
> Cc: idna-update at alvestrand.no
> Subject: Label separators (was: Re: Urdu and SPACE, FULL STOP (Re:
> comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic))
> 
> Dr. Hussain (and others),
> 
> I've been distracted by other work for a few days, but want to
> address the FULL STOP problem, which, as Harald pointed out, is
> associated with a label separator issue and not an issue with
> "tables" at all.
> 
> The problem we face here is that the single most critical
> consideration in looking at IDNA is that the DNS, and DNS
> applications that are not IDNA-aware, must continue to work well
> and predictably when confronted with IDN labels in either native
> Unicode character or ACE form.
> 
> Personally, I frequently wish that constraint did not exist
> because one can imagine many interesting things that could be
> done without it.  But the price of eliminating the constraint is
> modifications to the DNS that would take us considerable effort
> and probably many years to deploy.  No one wants to wait that
> long so we are stuck with the constraint.
> 
> For label separators, the constraint has even stronger
> implications than it does for matching rules (I've discussed the
> latter in another note) because applications and systems that
> are otherwise unaware of the DNS itself (not just unaware of
> IDNA) have to be able to parse full domain names into labels in
> order to map back and forth between the "labels separated by
> full stops" format that we usually see and the DNS internal
> format (a list of labels with explicit length information).
> Even the language of IDNA2003 about mapping of period-like
> characters isn't sufficient to prevent those characters from
> showing up in contexts in which they would interfere with domain
> name parsing.  However the intent is clear, and that intent is
> to be sure that, by the time a domain name makes it into a file
> or out on the Internet, the things that look like full stops
> must be translated into ASCII periods and the latter substituted.
> 
> Oddly, this is where the "no mapping in the protocol" principle
> of the IDNA200X proposals become very helpful.  The IDNA2003
> version says, in essence, "these characters (and no others) are
> considered appropriate alternative forms of label separators,
> but you have to map them to ASCII period when you see them".
> The IDNA200X version is equivalent to "the only valid label
> separator on the wire or in interchange is ASCII period.
> However, since we have prohibited all other punctuation
> characters (other than hyphen) from ever actually appearing in a
> domain name, if you need to use a convention locally to permit
> easier typing of that character, you can substitute any
> convenient punctuation (or other disallowed) character for it...
> as long as it is mapped to ASCII period before you store it in a
> file or transmit it on the wire".
> 
> That is clearly not a perfect solution, but it gives you the
> flexibility you need while preserving both global
> interoperability and the ability for non-IDNA applications to
> unambiguously parse domain names into labels.
> 
>     john



More information about the Idna-update mailing list