Label separators (was: Re: Urdu and SPACE, FULL STOP (Re: comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic))

Mark Davis mark.davis at icu-project.org
Sat Feb 23 20:17:36 CET 2008


The normal convention is to use uppercase for the official Unicode/10646
character name, but there is no character with the name "URDU FULL STOP". Do
you mean:

U+06D4 <http://unicode.org/cldr/utility/character.jsp?a=06D4> ( ‎۔‎ ) ARABIC
FULL STOP

Mark

On Sat, Feb 23, 2008 at 11:00 AM, Sarmad Hussain <sarmad.hussain at nu.edu.pk>
wrote:

> Dear John Klensin and all,
>
> Thank you for your comments.  I understand and agree.  This is exactly
> what
> I am arguing for as well, i.e. :
>
> "if you need to use a convention locally to permit easier typing of that
> character, you can substitute any convenient punctuation (or other
> disallowed) character for it... as long as it is mapped to ASCII period
> before you store it in a file or transmit it on the wire"
>
> However, if IDN standards stop short of providing clear auxiliary
> recommendations on WHICH "convenient punctuation" to substitute and HOW
> (i.e. map which UNICODE characters onto which ASCII characters),
> applications providers like Microsoft, Mozilla, etc., tend to implement
> their own interpretation for the browsers.  Unfortunately, many user
> communities do not have experience to get their voice to these application
> providers.
>
> So if the standards list these auxiliary recommendations, there is a
> likely
> chance that they will be supported by the application providers as well,
> even if language communities are not able to contact them directly.
>
> In summary, I am not asking that 06D4 be tramitted on the wire.  I am
> suggesting that, to ensure that URDU FULL STOP is processed on application
> end, relevant IDN standards should explicity recommend that application
> providers map 06D4 onto a dot, if they see it in a domain name, before
> transmitting it on the wire.
>
>
>
> Best regards,
> Sarmad
>
>
>
> > -----Original Message-----
> > From: John C Klensin [mailto:klensin at jck.com]
> > Sent: Saturday, February 23, 2008 10:15 PM
> > To: Sarmad Hussain
> > Cc: idna-update at alvestrand.no
> > Subject: Label separators (was: Re: Urdu and SPACE, FULL STOP (Re:
> > comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic))
> >
> > Dr. Hussain (and others),
> >
> > I've been distracted by other work for a few days, but want to
> > address the FULL STOP problem, which, as Harald pointed out, is
> > associated with a label separator issue and not an issue with
> > "tables" at all.
> >
> > The problem we face here is that the single most critical
> > consideration in looking at IDNA is that the DNS, and DNS
> > applications that are not IDNA-aware, must continue to work well
> > and predictably when confronted with IDN labels in either native
> > Unicode character or ACE form.
> >
> > Personally, I frequently wish that constraint did not exist
> > because one can imagine many interesting things that could be
> > done without it.  But the price of eliminating the constraint is
> > modifications to the DNS that would take us considerable effort
> > and probably many years to deploy.  No one wants to wait that
> > long so we are stuck with the constraint.
> >
> > For label separators, the constraint has even stronger
> > implications than it does for matching rules (I've discussed the
> > latter in another note) because applications and systems that
> > are otherwise unaware of the DNS itself (not just unaware of
> > IDNA) have to be able to parse full domain names into labels in
> > order to map back and forth between the "labels separated by
> > full stops" format that we usually see and the DNS internal
> > format (a list of labels with explicit length information).
> > Even the language of IDNA2003 about mapping of period-like
> > characters isn't sufficient to prevent those characters from
> > showing up in contexts in which they would interfere with domain
> > name parsing.  However the intent is clear, and that intent is
> > to be sure that, by the time a domain name makes it into a file
> > or out on the Internet, the things that look like full stops
> > must be translated into ASCII periods and the latter substituted.
> >
> > Oddly, this is where the "no mapping in the protocol" principle
> > of the IDNA200X proposals become very helpful.  The IDNA2003
> > version says, in essence, "these characters (and no others) are
> > considered appropriate alternative forms of label separators,
> > but you have to map them to ASCII period when you see them".
> > The IDNA200X version is equivalent to "the only valid label
> > separator on the wire or in interchange is ASCII period.
> > However, since we have prohibited all other punctuation
> > characters (other than hyphen) from ever actually appearing in a
> > domain name, if you need to use a convention locally to permit
> > easier typing of that character, you can substitute any
> > convenient punctuation (or other disallowed) character for it...
> > as long as it is mapped to ASCII period before you store it in a
> > file or transmit it on the wire".
> >
> > That is clearly not a perfect solution, but it gives you the
> > flexibility you need while preserving both global
> > interoperability and the ability for non-IDNA applications to
> > unambiguously parse domain names into labels.
> >
> >     john
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20080223/1350cc49/attachment-0001.html


More information about the Idna-update mailing list