Urdu and SPACE, FULL STOP (Re: comments on IDNAbis:draft-faltstrom-idnabis-tables-04.txt Arabic )

Kent Karlsson kent.karlsson14 at comhem.se
Wed Feb 20 10:46:17 CET 2008


There is no character named DEVANAGARI FULL STOP.
U+0964 has the formal standard name DEVANAGARI DANDA.
Even though DEVANAGARI DANDA to some extent is used
like FULL STOP is used, that is no reason to confuse
the issue by calling it DEVANAGARI FULL STOP as an
apparent (but infactual) formal name.

	/kent k


> -----Original Message-----
> From: idna-update-bounces at alvestrand.no 
> [mailto:idna-update-bounces at alvestrand.no] On Behalf Of 
> Basanta shrestha
> Sent: Wednesday, February 20, 2008 10:10 AM
> To: Harald Alvestrand
> Cc: Sarmad Hussain; idna-update at alvestrand.no
> Subject: Re: Urdu and SPACE, FULL STOP (Re: comments on 
> IDNAbis:draft-faltstrom-idnabis-tables-04.txt Arabic )
> 
> Dear All,
> This is a very important issue pointed out by Dr. Sarmad. Same is true
> for Nepali(Devanagari). Referring to the following entry in the
> idnabis-table :
> 
> 0964  ; DISALLOWED # DEVANAGARI FULL STOP
> 
> But we do have the dot( . ) in our layout but only on the numeric
> keypad as a decimal sign. These numeric keypad is not easily
> accessible in laptops.  IDNA enabled applications should be smart
> enough to convert the language specific full stop to English dot
> wherever it accepts the internationalized domain names.
> 
> Regards,
> Basanta Shrestha
> 
> 
> 
> 
> On Feb 20, 2008 1:29 PM, Harald Alvestrand 
> <harald at alvestrand.no> wrote:
> > Dr. Hussein,
> >
> > thank you for your very detailed and clear commentary!
> >
> > Some further questions below from one that is unfortunately 
> ignorant of
> > the details of the Arabic script.........
> >
> > Sarmad Hussain skrev:
> > >
> > >
> > >
> > > Dear All,
> > >
> > >
> > >
> > > Here are some comments on the draft posted at
> > > 
> http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-ta
bles-04.txt.
> > >  These observations are based on (mostly) the perspective of Urdu.
> > > Referring to pages 21-22 of the report, my comments are 
> after quoting
> > > the relevant line from the report (prefixed by >>>> 
> symbol).  Please
> > > especially note comments on 06D4 and "space" character 
> (at the end).
> > >
> > >
> > > 0640..065E  ; PVALID     # ARABIC TATWEEL..ARABIC FATHA 
> WITH TWO DOTS
> > >
> > >
> > > >>>>0640 should be DISALLOWED as it will create 
> significant security problems (kashida only causes stylistic 
> (not shape) variation of characters)
> > > >>>>0641..064A  agreed as PVALID
> > > >>>>064B..0652  agreed; would love to hear the argument 
> for including them as there was initially discussion for not 
> including them
> > We have long agreed that combining marks (category Mn in the Unicode
> > tables) need to be allowed for some characters. At this 
> point, nobody's
> > come out with a convincing argument for a general ban 
> against them, and
> > one whole strand of this effort (the BIDI document) is was 
> started with
> > the aim of making sure they're permitted in the final letter of a
> > right-to-left label (which IDNA2003 did not permit).
> > > 06D4        ; DISALLOWED # ARABIC FULL STOP
> > >
> > > >>>>should be allowed as a delimeter for Urdu, like the 
> dot in the domain name (should be mapped onto a dot 
> automatically at client layer);  As internationalized domain 
> names deal with the end user layer (application layer), they 
> need to be a bit more sensitive to user needs.  This 
> delimeter, as specified in Unicode, is only required for 
> Urdu.  However, Urdu writing does not have a dot and dot is 
> also not present on Urdu keyboards.  If the delimeter is not 
> allowed (and then mapped to dot), the user will get confused 
> and also will not be able to type the dot without having an 
> English keyboard installed and without switching to English 
> keyboard 2-3 times within writing a single domain name in 
> Urdu (once to-english-and-back-to-Urdu between each level of 
> TLD).  Standard should include this as a recommendation for 
> applications.
> > The -tables document deals only with the labels themselves, not with
> > their delimiters - for that you need to go to -issues or 
> -protocol. In
> > this particular case, I know work has been going on 
> elsewhere to handle
> > "alternate" delimiters like the IDEOGRAPHIC FULL STOP 
> before one gets to
> > the stage of deciding what is a label and what is not; adding this
> > character to the ones being considered there might be a 
> Good Thing - but
> > it is outside the scope of the -tables document.
> > >
> > >
> > >
> > > In addition, in Urdu we also would have a problem for not allowing
> > > space as we do not have use of ZWNJ in Pakistan.  Urdu users in
> > > Pakistan type space whether it is required to shape 
> letter within a
> > > word or at the end of it.  It is not possible to train 
> all users to
> > > distinguish between space and ZWNJ (especially as the 
> latter is not a
> > > linguistic entity in the language and users are never taught its
> > > concept, but a computational engineering solution from 
> the perspective
> > > of Urdu).  As the domain name standard has to deal with 
> applications
> > > with which users will be directly interacting, it may 
> also be included
> > > as a recommendation (at least for Urdu) that the users 
> may be allowed
> > > to type it and it may be automatically be converted to 
> ZWNJ (and could
> > > follow same rules as ZWNJ after such conversion).
> > >
> > >
> > I am curious about this ... can you tell me more about how 
> Urdu speakers
> > regard the words that the Unicode Consortium's experts 
> insist can only
> > be written by use of ZWNJ - are they regarded as "two 
> words, set closely
> > together", or are they regarded as "one word that we have 
> to type in a
> > weird way"?
> >
> > In the Latin-script languages, we have forced all users, if 
> they want to
> > have multiple words in their labels, to use the unnatural and
> > strange-looking hyphen, or write each word in a separate 
> label and have
> > a domain name with multiple strings.
> > If we could avoid the use of ZWNJ entirely without causing 
> too much pain
> > to users (no more than is currently suffered by users of English and
> > Norwegian), that would simplify our rules a great deal.
> >
> > Again, thank you very much for your information!
> >
> >                      Harald Alvestrand
> >
> >
> >
> > _______________________________________________
> > Idna-update mailing list
> > Idna-update at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/idna-update
> >
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update



More information about the Idna-update mailing list