suppress-script values for macrolanguage-encompassed languages

Doug Ewell doug at ewellic.org
Fri Dec 24 07:03:10 CET 2010


CE Whitehead <cewcathar at hotmail dot com> wrote:

> And both [ar] (the macro-language) and [arb] (standard Arabic) should 
> get a suppres-script [arab], as should in my opinion Egyptian Arabic, 
> Moroccan Arabic, and North Levantine Arabic (however not being a 
> native speaker and having studied primarily the Standard with a bit of 
> Levantine Arabic I don't feel my opinion is definitive).But I am not 
> sure what script is most used online to write dialects which are not 
> traditionally written at all -- for which the most common uses would 
> be blogs, text, email, etc, also some Bible translations (which online 
> seem to be overwhelmingly in Latin script I think for convenience). 
> These may be overwhelmingly written in Arabic script but it's also 
> possible that a mix of scripts is used.

I'm sure nobody here disagrees about Standard Arabic itself, nor about 
many of the other well-known Arabics.

I thought this thread started with the premise that the Suppress-Script 
status of a macrolanguage and that of each of its encompassed languages 
were interdependent -- that a macrolanguage should not have S-S unless 
*all* of its encompassed languages share the same S-S.

That troubles me when we have a concept like "Arabic," with well over 
200 million speakers of all varieties of Arabic combined, where almost 
anyone familiar with Arabic would say that the writing system 
"overwhelmingly" used by the literate members of that group is the 
Arabic writing system -- and then we have concepts like Judeo-Yemeni 
Arabic, with 50,000 speakers, written in Hebrew, or Cypriot Spoken 
Arabic, with 1,300 speakers, written in we-don't-know-what, and we are 
prepared to say that because these encompassed languages don't qualify 
for an S-S of 'Arab', neither does the macrolanguage 'ar'.  That doesn't 
seem right.  I don't know how many exceptions like these exist, but a 
few hundred thousand out of 200 million still leaves an overwhelming 
majority in its wake.

> The goal I am assuming is to tag online content appropriately.

The goal is to tag all content appropriately.

>> 2.  The idea that we are now relying on individual research and 
>> "feedback from native bloggers and such" to determine Suppress-Script 
>> appropriateness.
>
> Suit yourself; I personally prefer to hear from the communities that 
> will use the subtags.  As for individual research -- Peter proposed 
> this and no one acted on it so I thought I'd go through and see which 
> were the best candidates for suppress-script, based on what I could 
> find and what I knew from previous experience/knowledge of 
> linguistics -- and so I am expressing my opinion on the 
> suppress-script requests Peter submitted; I assumed that everyone on 
> the list would offer an opinion.But I can't say what data the language 
> subtag reviewer will consider when he makes his decision.

Are bloggers considered a typical cross-section of language users in 
general?  Would they necessarily use the same script as other users who 
do not publish to a worldwide audience?  Professional linguists would 
have to address questions like this, as would any type of survey-taker.

>> I understood Suppress-Script to be for the obvious cases, to 
>> discourage people from adding patently pointless script subtags (as 
>> in "en-Latn" or "ru-Cyrl").
>
> Yes that's my understanding too -- suppress-script is to indicate what 
> script online content in a particular language is overwhelmingly 
> written in.

Not just online.

> Now what do you or anyone know about Konkani, Tamashek, or even Malay 
> varieties?  Because these were cases where I was unsure of the 
> appropriateness of suppress-script and I don't know much about any of 
> these.

I don't really think the question is what do I know.  I didn't express 
an opinion about Konkani or Tamashek or Malay.

I think if we are unsure *at any time* about the appropriateness of S-S, 
the right thing to do is leave it off.  RFC 5646, Section 3.1.9 says, 
"Many language subtag records do not have a 'Suppress-Script' field. 
The lack of a 'Suppress-Script' might indicate that the language is 
customarily written in more than one script or that the language is not 
customarily written at all.  It might also mean that sufficient 
information was not available when the record was created and thus 
remains a candidate for future registration."  Maybe that does mean 
waiting for the native bloggers to weigh in, but it also might mean just 
plain leaving it alone.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­





More information about the Ietf-languages mailing list