I agree with Harald about not using CLDR in the way you describe (and I'm the chair of the CLDR group). What are and are not characters in common use in a language is rather fuzzy. In CLDR we try to deal with that issue by having a "core" list (eg A-Z for English) and an auxililary list for characters that are not core, but in common customary use in books, magazines, etc. It is notoriously difficult to draw a bright line even there -- so you don't want that baked into a protocol. That is best left up to the registries, like DENIC.<br>
<br clear="all">
Mark<br>
<br><br><div class="gmail_quote">On Mon, Jan 12, 2009 at 11:39, Harald Alvestrand <span dir="ltr"><<a href="mailto:harald@alvestrand.no" target="_blank">harald@alvestrand.no</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Troy wrote:<br>
> I had some comments on the IDNA2008 and thought I'd send them to this<br>
> mailing list first.<br>
><br>
.... (not commenting on 1 and 2)<br>
><br>
> 3.<br>
> There is no mention of a requirement for a single language/locale in<br>
> IDNA strings. I noticed there was some discussion about this, but no<br>
> mention in the document proper. Many security issues with i18n domain<br>
> names occur from the use of characters from multiple languages/locales<br>
> together. If a requrement was made that every name must contain<br>
> characters from only one language/locale, most (maybe all?) of these<br>
> poroblems could be avoided. Using characters from multiple languages in<br>
> a domain name is a rare need. I can't actually think of any legitimate<br>
> need for such names.<br>
><br>
> If one used the sets of exemplarCharacters from CLDR, we would even have<br>
> a ready database of valid characters. I.e. there is no need for<br>
> additional work to classify characters. The work has already been done,<br>
> at least for the most part. If some locale doesn't have characters which<br>
> are needed, it is easy to add them to the CLDR.<br>
><br>
> A name can consist of characters in multiple scripts.<br>
> E.g. linux<a href="http://xn--u9j1gre1c148tyobi53m.com" target="_blank">クラブの参加者.com</a> which contains ascii,katakana,hiragana and<br>
> kanji. These are all used in Japanese, though, and therefore valid in<br>
> that locale/language. I can't think of a legitimate use for a name with<br>
> multiple languages. Such names will only serve to confuse.<br>
><br>
> I understand that it's difficult to specify that only characters from<br>
> some external list (CLDR in this case) are to be allowed. This could be<br>
> solved by specifying the version of CLDR, and then later updating only<br>
> that part of the document with a revision document. The other option is<br>
> to specify that "local" checking of locale is done, meaning that<br>
> browsers and other software check the current CLDR, whatever it is.<br>
I'd strongly object to placing a dependency on CLDR's character lists in<br>
the standard itself.<br>
<br>
Two reasons:<br>
<br>
1) Enforcing this requirement would require (not just recommend) that<br>
the intended locale for each and every domain name be known at<br>
registration-checking time. Otherwise, there's no way to know what rules<br>
to enforce.<br>
<br>
2) The CLDR database is developed for multiple purposes, there's always<br>
debate about what should be in it, and there's no way to get at the<br>
justifications for the exclusions or inclusions.<br>
<br>
For instance, the CLDR locale for Norwegian, as picked up from<br>
<a href="http://www.unicode.org/cldr/data/charts/summary/no.html" target="_blank">http://www.unicode.org/cldr/data/charts/summary/no.html</a>, claims that<br>
its "standard" characters are these:<br>
<br>
[a à b-e é f-o ó ò ô p-z æ ø å]<br>
<br>
and its "auxillary" characters are these:<br>
<br>
[á ǎ ã č ç đ è ê í ń ñ ŋ š ŧ ü ž ä ö]<br>
<br>
These seem to cover all of the characters in the Norwegian domain name<br>
registry's rules (<a href="http://www.norid.no/navnepolitikk.html#link3" target="_blank">http://www.norid.no/navnepolitikk.html#link3</a>), but is<br>
slightly different - in this case, ã and í is allowed.<br>
<br>
The question of making registrations match with locales has been pushed<br>
off to the registries, and I think it should stay pushed.<br>
<br>
Harald<br>
<br>
<br>
<br>
_______________________________________________<br>
Idna-update mailing list<br>
<a href="mailto:Idna-update@alvestrand.no" target="_blank">Idna-update@alvestrand.no</a><br>
<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>
</blockquote></div><br>