IDNAbis spec

Abdulrahman I. ALGhadir aghadir at citc.gov.sa
Wed Nov 4 05:38:53 CET 2009


" My (Very Limited) understanding (from Arabic speaking coworkers) is that in this case the diacritics are sort of optional. Almost like how word corrects "naive" to "naïve".

My understanding is that the form with the diacritics is the formal spelling, particularly in religious texts, however apparently the words are also spelled quite commonly without the diacritics.  If that's true, then I'd suggest the registrar consider "bundling" the two forms.  Which form the organization used for reverse lookup would then be up to that organization."

Well as you said diacritics are only used in old Arabic texts which is not commonly used now days.

" My coworker also made interesting observations about the address bar.  http://ت.ت gets displayed like ت.ت//:http, which is expected.  What was unexpected (to me) is that in an RTL context, my coworker would also prefer to see http://microsoft.com displayed as com.microsoft//:http.  His actual suggestion was that it be parsed by labels and then put the labels in RTL order.  FWIW: IE probably has to do something because http://a.ت.com gets very ugly right now in the address bar. "

Let me explain you what happened in two scenarios :

Assumption : address bar has RTL embedding direction.

1) http://ت.ت gets displayed as ت.ت//:http because the :// are considered neutrals characters thus http has strong LTR direction while ت   has strong RTL direction in this case  :// will take address bar default direction which is RTL .

2) http://microsoft.com gets displayed http://microsoft.com because in this case the direction between two runs didn't change so :// will take the run direction which is LTR.

These problems will arise as long there are mixing in directions that’s what the RFC told us, For more information check UAX #9.

AbdulRahman,


-----Original Message-----
From: Shawn Steele [mailto:Shawn.Steele at microsoft.com] 
Sent: 3/Nov/2009 8:26 PM
To: Alireza Saleh; Abdulrahman I. ALGhadir
Cc: muhtaseb at kfupm.edu.sa; idna-update at alvestrand.no; Lisa Dusseault
Subject: RE: IDNAbis spec

My (Very Limited) understanding (from Arabic speaking coworkers) is that in this case the diacritics are sort of optional. Almost like how word corrects "naive" to "naïve".

My understanding is that the form with the diacritics is the formal spelling, particularly in religious texts, however apparently the words are also spelled quite commonly without the diacritics.  If that's true, then I'd suggest the registrar consider "bundling" the two forms.  Which form the organization used for reverse lookup would then be up to that organization.

My coworker also made interesting observations about the address bar.  http://ت.ت gets displayed like ت.ت//:http, which is expected.  What was unexpected (to me) is that in an RTL context, my coworker would also prefer to see http://microsoft.com displayed as com.microsoft//:http.  His actual suggestion was that it be parsed by labels and then put the labels in RTL order.  FWIW: IE probably has to do something because http://a.ت.com gets very ugly right now in the address bar.  

I'm not sure the display is necessary for the working group to figure out, from what I can tell, the updated RFC permits appropriate strings, it's probably up to the clients to make sure they get displayed in a reasonable fashion.

-Shawn

________________________________________
From: idna-update-bounces at alvestrand.no [idna-update-bounces at alvestrand.no] on behalf of Alireza Saleh [saleh at nic.ir]
Sent: Tuesday, November 03, 2009 8:08 AM
To: Abdulrahman I. ALGhadir
Cc: muhtaseb at kfupm.edu.sa; idna-update at alvestrand.no; Lisa Dusseault
Subject: Re: IDNAbis spec

Abdulrahman I. ALGhadir wrote:
> Hey,
>
>
>
> [Quote]: “what if we allow diacritics on the domain name then a domain name like
>
> مايكروسوفت.شركة
>
> Will be different than the
>
> مَايكروسوفت.شركة
>
> Because in the second one there is a diacritic on the first letter.
>
> Although this diacritic is implicit in the first one.
>
> So this might cause a lot of problems in the domain names registration and owner claims.” [/Quote]
>
>
I think this is the registry ( Zone owner ) decision to allow or deny
the usage of certain characters including diacritics, however diacritics
are part of the some languages. There may be characters ( not necessary
diacritics ) in a languages that using them may cause problems, in these
cases the registry can decide to remove those characters from the
character repertoire for that language.
>
>
> Well this has been answered in “NSM flow?”
>
>
>
> [Quote]
>
>   “Moreover, for the displaying order of the labels of a domain name I have tried the following hypothetical domain names:
>
>
> Husni.حاسب.شركة
> حسني.حاسب.شركة
> husni.حاسب.com
> حسني.computer.شركة
> حسني.حاسب.com
> husni.computer.شركة
> husni.computer.com
>
> The following is an image of the network order from right to left  for Arabic of the above:
>
>
> It is clear that when we use two consecutive RTL labels separated by dots and followed by one LTR label the display order does not look as it
> should. The same is true that when we use two consecutive LTR labels separated by dots and followed by one RTL. The question is should we allow such confusion?”[/Quote]
>
> from draft-ietf-idnabis-bidi-06
>
> [Quote]
>
> “   o  The sequence of labels should be consistent with network order.
>
>       This proved impossible - a domain name consisting of the labels
>
>       (in network order) L1.R1.R2.L2 will be displayed as L1.R2.R1.L2 in
>
>       an LTR context.  (In an RTL context, it will be displayed as
>
>       L2.R2.R1.L1).”
>
> [/Quote]
>
>
>
> Well this problem was expected to happen, IDNA uses a UAX#9 Bidi algorithm version-like where some rules have been removed.
>
Which of the above examples represent L1.R1.R2.L2 ? These cases require
inter-label checking and the working group came to the consensus that do
not perform such tests.

-- Alireza

_______________________________________________
Idna-update mailing list
Idna-update at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/idna-update

-----------------------------------------------------------------------------------
Disclaimer:
This message and its attachment, if any, are confidential and may contain legally
privileged information. If you are not the intended recipient, please contact the
sender immediately and delete this message and its attachment, if any, from your
system. You should not copy this message or disclose its contents to any other
person or use it for any purpose. Statements and opinions expressed in this e-mail
are those of the sender, and do not necessarily reflect those of the Communications
and Information Technology Commission (CITC). CITC accepts no liability for damage
caused by this email.


More information about the Idna-update mailing list