Now, the difference between &quot;language&quot; identifiers and &quot;locale&quot; identifiers is notoriously slippery, so I'll provide some background on how CLDR is actually structured, so you don't have to guess.<br><br>

The CLDR data is separated into language-specific data, and non-language specific data. The language-specific data does *not* include items like the currencies for a country, or the weekend days, etc.; that is all in the non-language specific data. Here are some examples:

<br><a href="http://unicode.org/cldr/data/common/collation/">http://unicode.org/cldr/data/common/collation/</a><br><a href="http://unicode.org/cldr/data/common/main/">http://unicode.org/cldr/data/common/main/</a><br><br>The non-language-specific data includes which currencies were valid in

a particular country during which years, or which languages are

customarily written in which scripts. Some examples are:<br><a href="http://unicode.org/cldr/data/common/supplemental/">http://unicode.org/cldr/data/common/supplemental/</a><br><a href="http://unicode.org/cldr/data/common/transforms/">

http://unicode.org/cldr/data/common/transforms/</a><br><br>The so-called locale inheritance is used for the language-specific data, not the non-language-specific data, so it would be more accurate to call it language inheritance. The vast majority of the language-specific data does not differ by country. While, for example, the content of 

en.xml is chosen to be appropriate for the the most populous country speaking en (the US), that doesn't mean that content is *always* inappropriate for many of the other regions that could use English (eg AG AI AS AU AW BB BM BS BW BZ CA CC CK CM CX DM ER FJ FK FM GB GD GH GI GM GY HK IE IN IO JM KE KI KN KY LC LR LS MH MP MS MT MW NA NF NG NR NU NZ PG PH PK PN PW RW SB SG SH SL SZ TC TK TO TT TZ UG UM US VC VG VI ZA ZM ZW).

<br><br>In cases where content does differ according to the region, such as the UK, then one includes overrides of what is in en.XML. (Where the language-specific data for two locale/language tags are the same and different than the base, one can be aliased (either in full or in part) to the other. Thus if en_ZW, for example, followed UK spelling conventions, then it could be aliased to en_UK. While the files use &quot;_&quot;, CLDR recognizes &quot;-&quot; and &quot;_&quot; as equivalent in identifiers.)

You say: &gt;But clearly there is no such thing as a region-neutral English locale This sentence is a bit slippery; it depends highly on what one means by locale. Let me recast it. For a given type of content (eg country names) and a given language subtag, there may be differences among regions (as defined by BCP47) or it may be that all regions share the same values. (For that matter, there may be differences *within* regions, as well -- either according to sub-region that BCP 47 isn't fine-grained enough for (eg for some speech applications the differences Bostonian English may be important).

<br><br>Where there are differences in regions, the region is important. Where there are not differences between regions, the region is not important. Thus in many cases, the CLDR data does not differ by country at all, so requiring a country subtag is pointless. In that sense, I'd say your sentence

&gt; that region is a key attribute of a locale, is false. Region may or may not be significant, depending on the content, and depending on the language. If you meant to say that the *ability* to have a region as a component of locale/language is key, then I'd agree with you -- otherwise one couldn't distinguish between en-US and en-UK content.

<br><br>I do, however, agree with you on the major point: this is all about *defaults*; identifiers have an inherent limitation -- they represent some class of users, within which there will always be variations.<br><br>Mark

<br><br><br><div><span class="gmail_quote">On 9/27/06, <b class="gmail_sendername">Peter Constable</b> &lt;<a href="mailto:petercon@microsoft.com">petercon@microsoft.com</a>&gt; wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

[This is running a risk of straying off topic for this list, but I'll post this here since it still pertains to Don's questions regarding whether particular reg entries should have certain info added to them.]<br><br><br>

&gt; From: <a href="mailto:ietf-languages-bounces@alvestrand.no">ietf-languages-bounces@alvestrand.no</a> [mailto:<a href="mailto:ietf-languages-">ietf-languages-</a><br>&gt; <a href="mailto:bounces@alvestrand.no">bounces@alvestrand.no

</a>] On Behalf Of Kent Karlsson &gt; &gt; that region is a key attribute of a locale, &gt; &gt; ...no. Please explain. I guess this might depend on one's view of what the minimal set of information categories that are required for a locale consists of.

<br><br><br>&gt; &gt; locale ID must always include a region component as well as a<br>&gt; &gt; language component.<br>&gt;<br>&gt; CLDR locales don't. Just about all locale data can, and often should,<br>&gt; be in the &quot;language only&quot; named locales. Very rarely is there a difference

<br>&gt; from those locales that belong in the &quot;language_territory&quot; sublocales.<br><br>Not being a participant in the CLDR project, I'm not in a good position to evaluate the intent of the data I see there. I do note that, 

e.g. there is a file &quot;en.xml&quot;. But clearly there is no such thing as a region-neutral English locale: every English speaker lives in a region where one of &quot;M/d/yy&quot; or &quot;d/M/yy&quot; is the preferred short date format (and probably the majority live in regions that prefer the latter), but this data file is not neutral wrt short date format: in spite of the name, the data it contains really is applicable to the US. Now, perhaps the intent here is that this is data that can be used as a default if region-specific data is not available, but it seems to me that's just a round about way of saying that en-US is used as the default locale for English.

<br><br><br>&gt; Yes, but choosing (a single) currency or a choosing a measurement<br>&gt; system does not belong in a locale. Doing that is a mistake, similar to<br>&gt; that of selecting character encoding via locale (as, unfortunately done

&gt; in Unix/POSIX locales). These are only ever defaults. It's not appropriate to assume that every English speaker in the US wants a short date format of &quot;M/d/yy&quot;, but it is an appropriate default in that scenario. In the same way, it's not appropriate to assume that a user in the US will always use imperial units of measure, but it is reasonable to treat imperial units as a default. Same for currency.

<br><br><br>Peter Constable<br>_______________________________________________<br>Ietf-languages mailing list<br><a href="mailto:Ietf-languages@alvestrand.no">Ietf-languages@alvestrand.no</a><br><a href="http://www.alvestrand.no/mailman/listinfo/ietf-languages">

http://www.alvestrand.no/mailman/listinfo/ietf-languages</a><br></blockquote></div><br>