comments on IDNAbis: draft-faltstrom-idnabis-tables-04.txt Arabic block

Erik van der Poel erikv at google.com
Mon Feb 18 23:57:46 CET 2008


> >>> 0640..065E  ; PVALID     # ARABIC TATWEEL..ARABIC FATHA WITH TWO
> >>> DOTS
> >>>
> >>>>>>> 0640 should be DISALLOWED as it will create significant security
> >>>>>>> problems (kashida only causes stylistic (not shape) variation of
> >>>>>>> characters)
> >>
> >> 0640;ARABIC TATWEEL;Lm;0;AL;;;;;N;;;;;
> >>
> >> Codepoints of GeneralCategory Lm is allowed (matches category A).
> >> Same question as for 0615.
> >
> > Yes, 0640 should be DISALLOWED, by adding through exception list.
> > It has no
> > linguistic significance and may cause security problem if allowed.
>
> I let others comment on this.

The Unicode standard says that this is used for justification (i.e.
purely stylistic). If this is true for all languages, then I agree
that it should be DISALLOWED, in the list of exceptions.

> >>> 06D4        ; DISALLOWED # ARABIC FULL STOP
> >>>
> >>>>>>> should be allowed as a delimeter for Urdu, like the dot in the
> >>>>>>> domain name (should be mapped onto a dot automatically at client
> >>>>>>> layer);  As internationalized domain names deal with the end
> >>>>>>> user layer (application layer), they need to be a bit more
> >>>>>>> sensitive to user needs.  This delimeter, as specified in
> >>>>>>> Unicode, is only required for Urdu.  However, Urdu writing does
> >>>>>>> not have a dot and dot is also not present on Urdu keyboards.
> >>>>>>> If the delimeter is not allowed (and then mapped to dot), the
> >>>>>>> user will get confused and also will not be able to type the dot
> >>>>>>> without having an English keyboard installed and without
> >>>>>>> switching to English keyboard 2-3 times within writing a single
> >>>>>>> domain name in Urdu (once to-english-and-back-to-Urdu between
> >>>>>>> each level of TLD).  Standard should include this as a
> >>>>>>> recommendation for applications.
> >>
> >> 06D4;ARABIC FULL STOP;Po;0;AL;;;;;N;ARABIC PERIOD;;;;
> >>
> >> This is DISALLOWED as it is of GeneralCategory Al. I let others
> >> discuss the issues with full stop.
> >
> > Recommendation for application layer and map to dot/full stop within
> > the
> > standard would help, if not allowed otherwise.  Again, this is a
> > requirement
> > for Urdu as described earlier.
>
> That is one solution. How the application do handle various codepoints
> is not part of IDNA200X.

Actually, it is mentioned in:

http://www.ietf.org/internet-drafts/draft-klensin-idnabis-protocol-04.txt:

   "Other examples of processing for localization that might be applied,
   if appropriate, at this point (but even further outside the scope of
   this specification) include interpreting the KANA MIDDLE DOT as
   separating domain name components from each other"

Erik


More information about the Idna-update mailing list