CLDR data (Re: Comments on the IDNA2008 document)

Mark Davis mark at macchiato.com
Tue Jan 13 16:37:14 CET 2009


I agree with Harald about not using CLDR in the way you describe (and I'm
the chair of the CLDR group). What are and are not characters in common use
in a language is rather fuzzy. In CLDR we try to deal with that issue by
having a "core" list (eg A-Z for English) and an auxililary list for
characters that are not core, but in common customary use in books,
magazines, etc. It is notoriously difficult to draw a bright line even there
-- so you don't want that baked into a protocol. That is best left up to the
registries, like DENIC.

Mark


On Mon, Jan 12, 2009 at 11:39, Harald Alvestrand <harald at alvestrand.no>wrote:

> Troy wrote:
> > I had some comments on the IDNA2008 and thought I'd send them to this
> > mailing list first.
> >
> .... (not commenting on 1 and 2)
> >
> > 3.
> > There is no mention of a requirement for a single language/locale in
> > IDNA strings. I noticed there was some discussion about this, but no
> > mention in the document proper. Many security issues with i18n domain
> > names occur from the use of characters from multiple languages/locales
> > together. If a requrement was made that every name must contain
> > characters from only one language/locale, most (maybe all?) of these
> > poroblems could be avoided. Using characters from multiple languages in
> > a domain name is a rare need. I can't actually think of any legitimate
> > need for such names.
> >
> > If one used the sets of exemplarCharacters from CLDR, we would even have
> > a ready database of valid characters. I.e. there is no need for
> > additional work to classify characters. The work has already been done,
> > at least for the most part. If some locale doesn't have characters which
> > are needed, it is easy to add them to the CLDR.
> >
> > A name can consist of characters in multiple scripts.
> > E.g. linuxクラブの参加者.com <http://xn--u9j1gre1c148tyobi53m.com> which
> contains ascii,katakana,hiragana and
> > kanji. These are all used in Japanese, though, and therefore valid in
> > that locale/language. I can't think of a legitimate use for a name with
> > multiple languages. Such names will only serve to confuse.
> >
> > I understand that it's difficult to specify that only characters from
> > some external list (CLDR in this case) are to be allowed. This could be
> > solved by specifying the version of CLDR, and then later updating only
> > that part of the document with a revision document. The other option is
> > to specify that "local" checking of locale is done, meaning that
> > browsers and other software check the current CLDR, whatever it is.
> I'd strongly object to placing a dependency on CLDR's character lists in
> the standard itself.
>
> Two reasons:
>
> 1) Enforcing this requirement would require (not just recommend) that
> the intended locale for each and every domain name be known at
> registration-checking time. Otherwise, there's no way to know what rules
> to enforce.
>
> 2) The CLDR database is developed for multiple purposes, there's always
> debate about what should be in it, and there's no way to get at the
> justifications for the exclusions or inclusions.
>
> For instance, the CLDR locale for Norwegian, as picked up from
> http://www.unicode.org/cldr/data/charts/summary/no.html,  claims that
> its "standard" characters are these:
>
> [a à b-e é f-o ó ò ô p-z æ ø å]
>
> and its "auxillary" characters are these:
>
> [á ǎ ã č ç đ è ê í ń ñ ŋ š ŧ ü ž ä ö]
>
> These seem to cover all of the characters in the Norwegian domain name
> registry's rules (http://www.norid.no/navnepolitikk.html#link3), but is
> slightly different - in this case, ã and í is allowed.
>
> The question of making registrations match with locales has been pushed
> off to the registries, and I think it should stay pushed.
>
>                 Harald
>
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090113/a6c1c4ee/attachment.htm 


More information about the Idna-update mailing list