I agree with Harald about not using CLDR in the way you describe (and I&#39;m the chair of the CLDR group). What are and are not characters in common use in a language is rather fuzzy. In CLDR we try to deal with that issue by having a &quot;core&quot; list (eg A-Z for English) and an auxililary list for characters that are not core, but in common customary use in books, magazines, etc. It is notoriously difficult to draw a bright line even there -- so you don&#39;t want that baked into a protocol. That is best left up to the registries, like DENIC.<br>

<br clear="all">

Mark<br>

<br><br><div class="gmail_quote">On Mon, Jan 12, 2009 at 11:39, Harald Alvestrand <span dir="ltr">&lt;<a href="mailto:harald@alvestrand.no" target="_blank">harald@alvestrand.no</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


Troy wrote:<br>

&gt; I had some comments on the IDNA2008 and thought I&#39;d send them to this<br>

&gt; mailing list first.<br>

&gt;<br>

.... (not commenting on 1 and 2)<br>

&gt;<br>

&gt; 3.<br>

&gt; There is no mention of a requirement for a single language/locale in<br>

&gt; IDNA strings. I noticed there was some discussion about this, but no<br>

&gt; mention in the document proper. Many security issues with i18n domain<br>

&gt; names occur from the use of characters from multiple languages/locales<br>

&gt; together. If a requrement was made that every name must contain<br>

&gt; characters from only one language/locale, most (maybe all?) of these<br>

&gt; poroblems could be avoided. Using characters from multiple languages in<br>

&gt; a domain name is a rare need. I can&#39;t actually think of any legitimate<br>

&gt; need for such names.<br>

&gt;<br>

&gt; If one used the sets of exemplarCharacters from CLDR, we would even have<br>

&gt; a ready database of valid characters. I.e. there is no need for<br>

&gt; additional work to classify characters. The work has already been done,<br>

&gt; at least for the most part. If some locale doesn&#39;t have characters which<br>

&gt; are needed, it is easy to add them to the CLDR.<br>

&gt;<br>

&gt; A name can consist of characters in multiple scripts.<br>

&gt; E.g. linux<a href="http://xn--u9j1gre1c148tyobi53m.com" target="_blank">クラブの参加者.com</a> which contains ascii,katakana,hiragana and<br>

&gt; kanji. These are all used in Japanese, though, and therefore valid in<br>

&gt; that locale/language. I can&#39;t think of a legitimate use for a name with<br>

&gt; multiple languages. Such names will only serve to confuse.<br>

&gt;<br>

&gt; I understand that it&#39;s difficult to specify that only characters from<br>

&gt; some external list (CLDR in this case) are to be allowed. This could be<br>

&gt; solved by specifying the version of CLDR, and then later updating only<br>

&gt; that part of the document with a revision document. The other option is<br>

&gt; to specify that &quot;local&quot; checking of locale is done, meaning that<br>

&gt; browsers and other software check the current CLDR, whatever it is.<br>

I&#39;d strongly object to placing a dependency on CLDR&#39;s character lists in<br>

the standard itself.<br>

<br>

Two reasons:<br>

<br>

1) Enforcing this requirement would require (not just recommend) that<br>

the intended locale for each and every domain name be known at<br>

registration-checking time. Otherwise, there&#39;s no way to know what rules<br>

to enforce.<br>

<br>

2) The CLDR database is developed for multiple purposes, there&#39;s always<br>

debate about what should be in it, and there&#39;s no way to get at the<br>

justifications for the exclusions or inclusions.<br>

<br>

For instance, the CLDR locale for Norwegian, as picked up from<br>

<a href="http://www.unicode.org/cldr/data/charts/summary/no.html" target="_blank">http://www.unicode.org/cldr/data/charts/summary/no.html</a>, &nbsp;claims that<br>

its &quot;standard&quot; characters are these:<br>

<br>

[a à b-e é f-o ó ò ô p-z æ ø å]<br>

<br>

and its &quot;auxillary&quot; characters are these:<br>

<br>

[á ǎ ã č ç đ è ê í ń ñ ŋ š ŧ ü ž ä ö]<br>

<br>

These seem to cover all of the characters in the Norwegian domain name<br>

registry&#39;s rules (<a href="http://www.norid.no/navnepolitikk.html#link3" target="_blank">http://www.norid.no/navnepolitikk.html#link3</a>), but is<br>

slightly different - in this case, ã and í is allowed.<br>

<br>

The question of making registrations match with locales has been pushed<br>

off to the registries, and I think it should stay pushed.<br>

<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Harald<br>

<br>

<br>

<br>

_______________________________________________<br>

Idna-update mailing list<br>

<a href="mailto:Idna-update@alvestrand.no" target="_blank">Idna-update@alvestrand.no</a><br>

<a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br>

</blockquote></div><br>