suppress-script values for fil, mi, pes, prs, qu members
petercon at microsoft.com
Thu Oct 21 00:00:43 CEST 2010
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Doug Ewell
>> pre-4646 habits of not using script subtags have persisted beyond
>> publication of 4646.
> This isn't just force of habit. RFC 5646, Section 4.1 says clearly, "A subtag
> SHOULD only be used when it adds useful distinguishing information to the tag."
> In many cases, perhaps even most, a tag that says "Huallaga Huánuco
> Quechua, written in the Latin script" might not add useful distinguishing
> information beyond one that says simply "Huallaga Huánuco Quechua."
But implementers may not know if that's the case when there is no s-s field to guide them.
> A search engine (in the general data-processing sense, not just in the
> Google/Bing/Yahoo! sense) already knows what characters are used in a
> given piece of text; it doesn't need to be told.
Though if a search engine is reporting back metadata about that content, and it somehow determines it is Huallaga Huánuco Quechua, how will it know whether to declare the content as "qub" or "qub-Latn"? In the absence of s-s information, the only options for an implementer are
a) collect and incorporate into your app private s-s information
b) always add a script subtag when there is no s-s information
c) always leave out a script subtag
Option a won't be appealing to too many developers just because of the need to collect hard-to-find info. Option b will mean that you end up violating the recommendation in 4.1 in an unknown number of cases. Option c will result in underspecifying the information in an unknown number of cases in which a language is written in more than one script.
Now, given that we can't expect s-s information to ever be comprehensive and accurate in the registry, this problem will always remain for some cases. The one that strikes me as most manageable is option b: over-specifying is inefficient, but doesn't lead to other problems except for the issue of comparing ll(-CC) and ll-Ssss(-CC) tags. It avoids problems of ambiguity that option c leads to. And the problems of adding redundant script subtags and of comparing tags with/without script subtags can be mitigated by adding more s-s information in the registry as that becomes available. It's not perfect because the s-s information can never be complete; but it's less imperfect than options a or c.
More information about the Ietf-languages