Problems deciding if az- should have multiple registrations...

Tue Apr 15 10:48:30 CEST 2003

Peter_Constable at sil.org wrote:
> 
>>>You want us to have RFC3066(bis) tags to distinguish character set
>>>encodings?
>>
>>Oh heavens no! By no means is that what I meant.
> 
> Phew! Glad to hear that. But, in that case, I must have missed your point
> regarding ja_JP and sv_SE -- you need the longer forms to get UTF-8, but
> what's the bearing on our current discussion? (Perhaps the answer is
> partially in your following text.)
> 
Some people may need the longer forms to actually activate their 
system's locale mechanism (that is, the language tagging scheme for 
their system resources): .NET I believe doesn't allow you to directly 
instantiate a "language-only" RegionInfo (I'm a little fuzzy about the 
details of that).

> 
> So, let me see if I understand where you're going. Depending on what we
> have in mind by "mappings", perhaps systems don't need to use RFC3066(bis)
> tags like th-TH, but we define mappings specific to various host
> implementation environments that tell us things like "for Solaris 2.8, if
> language = 'th' and desired encoding is UTF-8, substitute 'th-TH'".
> 

Not *WE*, but rather vendors need to define the mapping for their 
particular implementation. I think the best we can do is suggest rules 
for mapping a language ID to local mechanisms. I'm not as concerned 
about existing irregularities as I am about getting script-encoding 
things to map correctly. Adding an additional level of complexity to the 
language tags decreases their ambiguity, but requires rather more care 
when interpreting them.

By way of example, one might define that "zh-Hans" maps to "zh_CN" or 
"zh__Simplified" in Java (the latter being hypothetical), not to "zh" 
directly.

Addison