Urdu and SPACE, FULL STOP (Re: comments on IDNAbis:draft-faltstrom-idnabis-tables-04.txt Arabic )

Basanta shrestha basanta.shrestha at gmail.com
Wed Feb 20 10:51:53 CET 2008


Thank you for quickly correcting me.
Regards,
Basanta Shrestha

On Feb 20, 2008 3:31 PM, Kent Karlsson <kent.karlsson14 at comhem.se> wrote:
>
> There is no character named DEVANAGARI FULL STOP.
> U+0964 has the formal standard name DEVANAGARI DANDA.
> Even though DEVANAGARI DANDA to some extent is used
> like FULL STOP is used, that is no reason to confuse
> the issue by calling it DEVANAGARI FULL STOP as an
> apparent (but infactual) formal name.
>
>         /kent k
>
>
> > -----Original Message-----
> > From: idna-update-bounces at alvestrand.no
> > [mailto:idna-update-bounces at alvestrand.no] On Behalf Of
> > Basanta shrestha
> > Sent: Wednesday, February 20, 2008 10:10 AM
> > To: Harald Alvestrand
> > Cc: Sarmad Hussain; idna-update at alvestrand.no
> > Subject: Re: Urdu and SPACE, FULL STOP (Re: comments on
>
> > IDNAbis:draft-faltstrom-idnabis-tables-04.txt Arabic )
> >
> > Dear All,
> > This is a very important issue pointed out by Dr. Sarmad. Same is true
> > for Nepali(Devanagari). Referring to the following entry in the
> > idnabis-table :
> >
> > 0964  ; DISALLOWED # DEVANAGARI FULL STOP
> >
> > But we do have the dot( . ) in our layout but only on the numeric
> > keypad as a decimal sign. These numeric keypad is not easily
> > accessible in laptops.  IDNA enabled applications should be smart
> > enough to convert the language specific full stop to English dot
> > wherever it accepts the internationalized domain names.
> >
> > Regards,
> > Basanta Shrestha
> >
> >
> >
> >
> > On Feb 20, 2008 1:29 PM, Harald Alvestrand
> > <harald at alvestrand.no> wrote:
> > > Dr. Hussein,
> > >
> > > thank you for your very detailed and clear commentary!
> > >
> > > Some further questions below from one that is unfortunately
> > ignorant of
> > > the details of the Arabic script.........
> > >
> > > Sarmad Hussain skrev:
> > > >
> > > >
> > > >
> > > > Dear All,
> > > >
> > > >
> > > >
> > > > Here are some comments on the draft posted at
> > > >
> > http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-ta
> bles-04.txt.
> > > >  These observations are based on (mostly) the perspective of Urdu.
> > > > Referring to pages 21-22 of the report, my comments are
> > after quoting
> > > > the relevant line from the report (prefixed by >>>>
> > symbol).  Please
> > > > especially note comments on 06D4 and "space" character
> > (at the end).
> > > >
> > > >
> > > > 0640..065E  ; PVALID     # ARABIC TATWEEL..ARABIC FATHA
> > WITH TWO DOTS
> > > >
> > > >
> > > > >>>>0640 should be DISALLOWED as it will create
> > significant security problems (kashida only causes stylistic
> > (not shape) variation of characters)
> > > > >>>>0641..064A  agreed as PVALID
> > > > >>>>064B..0652  agreed; would love to hear the argument
> > for including them as there was initially discussion for not
> > including them
> > > We have long agreed that combining marks (category Mn in the Unicode
> > > tables) need to be allowed for some characters. At this
> > point, nobody's
> > > come out with a convincing argument for a general ban
> > against them, and
> > > one whole strand of this effort (the BIDI document) is was
> > started with
> > > the aim of making sure they're permitted in the final letter of a
> > > right-to-left label (which IDNA2003 did not permit).
> > > > 06D4        ; DISALLOWED # ARABIC FULL STOP
> > > >
> > > > >>>>should be allowed as a delimeter for Urdu, like the
> > dot in the domain name (should be mapped onto a dot
> > automatically at client layer);  As internationalized domain
> > names deal with the end user layer (application layer), they
> > need to be a bit more sensitive to user needs.  This
> > delimeter, as specified in Unicode, is only required for
> > Urdu.  However, Urdu writing does not have a dot and dot is
> > also not present on Urdu keyboards.  If the delimeter is not
> > allowed (and then mapped to dot), the user will get confused
> > and also will not be able to type the dot without having an
> > English keyboard installed and without switching to English
> > keyboard 2-3 times within writing a single domain name in
> > Urdu (once to-english-and-back-to-Urdu between each level of
> > TLD).  Standard should include this as a recommendation for
> > applications.
> > > The -tables document deals only with the labels themselves, not with
> > > their delimiters - for that you need to go to -issues or
> > -protocol. In
> > > this particular case, I know work has been going on
> > elsewhere to handle
> > > "alternate" delimiters like the IDEOGRAPHIC FULL STOP
> > before one gets to
> > > the stage of deciding what is a label and what is not; adding this
> > > character to the ones being considered there might be a
> > Good Thing - but
> > > it is outside the scope of the -tables document.
> > > >
> > > >
> > > >
> > > > In addition, in Urdu we also would have a problem for not allowing
> > > > space as we do not have use of ZWNJ in Pakistan.  Urdu users in
> > > > Pakistan type space whether it is required to shape
> > letter within a
> > > > word or at the end of it.  It is not possible to train
> > all users to
> > > > distinguish between space and ZWNJ (especially as the
> > latter is not a
> > > > linguistic entity in the language and users are never taught its
> > > > concept, but a computational engineering solution from
> > the perspective
> > > > of Urdu).  As the domain name standard has to deal with
> > applications
> > > > with which users will be directly interacting, it may
> > also be included
> > > > as a recommendation (at least for Urdu) that the users
> > may be allowed
> > > > to type it and it may be automatically be converted to
> > ZWNJ (and could
> > > > follow same rules as ZWNJ after such conversion).
> > > >
> > > >
> > > I am curious about this ... can you tell me more about how
> > Urdu speakers
> > > regard the words that the Unicode Consortium's experts
> > insist can only
> > > be written by use of ZWNJ - are they regarded as "two
> > words, set closely
> > > together", or are they regarded as "one word that we have
> > to type in a
> > > weird way"?
> > >
> > > In the Latin-script languages, we have forced all users, if
> > they want to
> > > have multiple words in their labels, to use the unnatural and
> > > strange-looking hyphen, or write each word in a separate
> > label and have
> > > a domain name with multiple strings.
> > > If we could avoid the use of ZWNJ entirely without causing
> > too much pain
> > > to users (no more than is currently suffered by users of English and
> > > Norwegian), that would simplify our rules a great deal.
> > >
> > > Again, thank you very much for your information!
> > >
> > >                      Harald Alvestrand
> > >
> > >
> > >
> > > _______________________________________________
> > > Idna-update mailing list
> > > Idna-update at alvestrand.no
> > > http://www.alvestrand.no/mailman/listinfo/idna-update
> > >
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
>
>


More information about the Idna-update mailing list