Suppress-Script candidates (was: Re: frr, fy, ngo, tt)

Wed Sep 27 19:44:44 CEST 2006

An effort led by Alberto Escudero-Pascual and Louise Berthilson last Feb.-March to get more African language locales into OOo and CLDR raised several questions in my mind and this was one of them. In fact, as we look at the possibility of a lot more locale data being compiled for minority/MINEL/pi languages, many of which are found uniquely within one country and may have a limited history of writing in only one script, this issue is potentially significant.

Has there been any discussion of forms having an additional logical step like: "Is this language native to more than one country?" (before the choice of countries line) or "Is this language written in more than one alphabet" (before the the choice of scripts line)?

I'm not following discussions here closely of late, so apologies if this issue has been resolved. Thinking of people writing locales, "suppress script" might handle the script issue if inserted into the data before someone without clear understanding of the system can enter in a redundant choice of (default) script. Is there anything like "suppress country" (which sounds more dangerous than it is, given the narrow technical focus here)? 

Has any thought been given to, at a relatively modest expense hopefully, having a team of say linguist grad students create "stub" locales for *all* languages in ISO-639 (attention to the 1, 2, 3's of course), including appropriate suppress script/country or default indications as necessary and also an appropriate range of countries for a language code when it is spoken in more than one. A code like "ig" carries with it a set of implications, sometimes very specific (1 country, 1 script), and this is the case for a lot of language codes without locales. Is it helpful to make those implications explicit early in the process?

On the other hand, this could be a tricky point when it comes to closely related tongues with separate '-3 codes that might most productively be treated as a unit ("macrolanguage" or even cluster?) in some localization contexts (I realize that this issue is supposed to be addressed in '-4 & '-5). Would creation of locales for say all the varieties of Arabic or Fulfulde/Pulaar or Manding (etc.!) be productive or muddy the waters unnecessarily for future work? In any event, if locale stubs were created they might need a review and a comments section to alert the approval system to such issues.

Don Osborn
Bisharat.net
PanAfrican Localisation project

-----Original Message-----
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Doug Ewell
Sent: Wednesday, September 27, 2006 11:54 AM
To: ietf-languages at iana.org
Cc: John Cowan
Subject: Re: Suppress-Script candidates (was: Re: frr, fy, ngo, tt)

John Cowan <cowan at ccil dot org> wrote:

>> Lines where LTRU has no script and CLDR has one are not something we 
>> need to spend a great amount of time worrying about, unless there is 
>> a genuine concern that people are going to start writing, say, 
>> "ig-Latn-NG" and it won't match with "ig-NG".
>
> Unfortunately, it is *precisely* that concern that got
> Suppress-Script: into RFC 4646 in the first place.  So that is what we 
> must get right. When people are confronted with a "Script" drop-down 
> menu, the instinct will be to choose the correct answer rather than 
> leaving it on default, so without adequate Suppress-Script:
> information the result will indeed be "ig-Latn-NG".

My point in choosing Igbo was that it is a language spoken by a large number of people in one country (18 million in Nigeria) and virtually nowhere else, so it would be unlikely that "ig-NG" would communicate much more information than "ig" alone.

. . .