Browser IDN display policy: opinions sought
kenw at sybase.com
Wed Dec 14 23:36:12 CET 2011
On 12/14/2011 3:02 AM, "Martin J. Dürst" wrote:
> On 2011/12/12 19:54, Gervase Markham wrote:
>> I can quite believe it may be something like this; but how does one deal
>> with the impedance mismatch that users think they are defining
>> languages, but what you need is scripts? Does IE keep a script/language
>> mapping? Is that data (perhaps compiled by others) publicly available
>> somewhere, e.g. from the Unicode consortium?
> For character coverage needed for a language, CLDR (the Unicode Common
> Locale Data Repository, http://cldr.unicode.org) provides quite a lot
> of data to work with, although you may want to have a closer look or
> talk with somebody more familiar with the data and processes before
> you work on a particular application.
Just following up this particular query about publicly available data
mapping, CLDR also makes available specific charts which specify the
scripts for a large number of languages, including nearly all of the
would be used for IDNs. See:
and the reverse indexed:
Although this data is not perfect or complete for *all* languages, it is
a very good
statement of 99.9% of the significant facts of usage relevant to the
debated on this thread, IMO.
Anyone making use of this data would need to become familiar with its
supplementalData.xml in the CLDR releases, and know something about the
which CLDR makes to the Unicode notion of "script", before just blindly
it. For example, the Japanese *language* is identified as being written
Japanese *script* in languages_and_scripts.html. The Japanese "script"
refers to the Japanese writing system, which combines several scripts,
but which, for
various implementations reasons is identified in CLDR with an aggregated
identifier. And so on.
However, I think this is the kind of machine-readable information that
Note also that CLDR is an ongoing project responsive to public input and
so if there are deficiencies, omissions, or outright errors in the
script and language
data, the CLDR project would like to hear about it via bug reports. See:
More information about the Idna-update