document on idn cctld names

Michel Suignard michelsu at windows.microsoft.com
Thu Feb 14 23:28:53 CET 2008


Harald,
All RTL 'short' names expressed in Arabic script are letters (Unicode GC:Lo) and Bidi class:AL, so they don't present any issue for either IDNA2003 or IDNA200x. Some longer names use the space character which is not possible in either version.

The only RTL 'short' Hebrew name so far is made of GC:Lo, Bidi:R, which is also fine.

Thaana is an issue in IDNA2003 for Maldives (mix of Lo/AL and Mn/NSM, but should pass in IDNA200x). Short name for Maldives in Thaana is U+078B U+07A8 U+0788 U+07AC U+0780 U+07A8.

I don't expect country names to be expressed in Syriac, and I have not done any research for N'Ko (obviously a moot point for IDNA2003 because it was encoded after Unicode 3.2). For the latter, the non spacing tone marks (Mn/NSM), if needed, would be covered by the rules necessary for Thaana.

Obviously you could potentially represent every country/territory in all scripts so my list is not complete on that aspect, but I already went quite far in that direction.

Finally, abbreviations that are commonly seen in longer forms of country names could be an issue because they use punctuations, but long names are unlikely in cctld context.

In other words, you should be covered, and I don't expect punctuations to be used in RTL names in cctld, but that is only my opinion of course and the issue of native cctld names is very touchy.

Michel

-----Original Message-----
From: Harald Tveit Alvestrand [mailto:harald at alvestrand.no]
Sent: Wednesday, February 13, 2008 7:31 PM
To: Michel Suignard; idna-update at alvestrand.no
Subject: Re: document on idn cctld names

Michael,

thanks for the information!

Question: Would it be possible for you to take the set of names that
contain RTL characters and check if they pass the requirements for BIDI, as
per the latest draft?
Or do you have a file with the codepoints only that I could use to check?

I don't see any punctuation in there, but in many of these scripts, I
wouldn't be able to tell one if it was there.....

                 Harald



--On 13. februar 2008 16:15 -0800 Michel Suignard
<michelsu at windows.microsoft.com> wrote:

> Sorry for the slightly out of topic post, but I think some of you will be
> interested by this document exploring possible values for idn cctlds that
> I mentioned before. Many of you answered me privately concerning their
> interest in this.
>
> Obvious caveat about being just an input document, various constituencies
> are in control of their own names and destiny. And it is a work in
> progress based on the current version of the iana root whois database for
> the cctlds. Please do not answer to this post in this list unless it is
> related to the IDN update work.
>
> Related to that, there are some relevant items to IDNA update in the
> document, such as issues for Sri Lanka, Myanmar, Cyprus, etc...
>
> Located in http://www.unicode.org/~suignard/IDNA_country_names.pdf
> Best regards,
>
> Michel
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>







More information about the Idna-update mailing list