Updated! LANGUAGE TAG REGISTRATION FORM : es-americas

Wed, 4 Sep 2002 22:12:50 -0500

On 09/04/2002 04:01:17 PM Michael Everson wrote:

>So what you are saying, Peter, is that you want a blanket group code
>to cover more than one language.

No, that's not what I'm saying (and for the record, I'm not requesting
this; I'm just trying to make sure this request is clearly understood, and
that *some* solution is obtained for the real problem that exists).

>This RFC specifies *languages*,

There's no escaping the fact that this RFC specifies a number of different
types of entities that are all somehow related to language, but not all are
specifically languages. This is quite easy to see:

- the RFC permits the use of "alg" as a tag, but it is clearly not a
language

- the RFC permits the use of "sgn" as a tag, but it is clearly not a
language

- the RFC permits the use of "und" as a tag, but it is clearly not a
language

(turning to less pathalogical examples)

- the RFC permits the use of "zh" as a tag, yet it is difficult to claim
that this represents a language given that the RFC also permits the use of
"zh-guoyu", "zh-gan", "zh-hakka", etc. as tags that each represent
languages

- the RFC permits the use of "en-US" as a tag, yet nobody claims that this
represents a distinct language (rather, it represents either an
orthography, a sub-language variant, or both)

- the RFC permits the use of "de-1901", yet nobody claims that this
represents a distinct language (rather, it represents an orthography)

- the RFC would permit (once registered or perhaps freely sanction given
publication of ISO 15924 and a revised RFC -- and I believe I'm right in
thinking that you wouldn't object) the use of a tag such as "zh-Hant" or
"zh-Hans", which would distinguish writing systems rather than languages

The proposed tag is intended to represent just one more kind of
language-related category.

>even to a lot of granularity
>(Scouse!) but I do not think that the entity described in
>"es-americas" is an actual entity.

In terms of my ontologocial model paper, it is an instance of the category
type I referred to there (grasping for a label) as "domain-specific data
set" -- in the context of this discussion, a label such as
"domain-constrained sub-language variant" may be better. It is a
characterisation that can be given to a given selection of data that has
been created by constraining language usage so as to make the content
appropriate for use throughout some domain. It is not an entity in the
sense that it corresponds to some ostensively identifiable speaker
community -- nobody actually speaks es-americas (though in principle it
would be easy to imagine a protocol droid that spoke it ). But the proposed
tag isn't intended to suggest that there is some corresponding speaker
community. Rather, it's intended to express something about the nature of
the language variety represented in a given information object, and there
are real scenarios for which nothing else captures what it is that users
want to say about content. Thus, there *is* an entity in a Platonic sense
-- an abstract entity that people do want to describe.

(To try an analogy, it's kind of like the UCS character U+17B4: there isn't
any such entity in reality, but there is an abstract, inferred entity that
some users have indicated they would find useful to be able to represent
explicitly.)

>Saying that there "might" be some
>spell-check differences between Europe and the Americas does not
>convince -- especially as I do not think that there are any. There
>may be vocabulary differences, but there are just as many of those
>between the American varieties as there are between them and Spain.

That may be true of unconstrained texts, but not of the kind of content for
which this is intended, as John has already illustrated. There is content
which is acceptable to audiences throughout the Americas, but not
necessarily in Spain (in practice, very often not), and there are companies
paying significant money for the development of content with specifically
those properties. They are looking for a tag that they need to make use of
and to maintain their investment.

>Is text-to-speech required? Apple implemented es-MX, which works in
>Mexico and in the southwestern US, though it might be less
>appropriate on the east-coast of the US.

I've not idea whether text-to-speech is a related requirement, but it
really is completely irrelevant. Presumably, one should be able to take
es-americas content and be able to legitimately apply speech synthesis
using processes tailored for any phonological dialect anywhere in the
Americas, according to a given user's preference.

>I learned Mexican Spanish when I lived in Arizona, and I meet
>European Spaniards regularly enough in Ireland. And Colombians. And
>Argentinians. And Cubans. They all speak differently from one another.

So you have mentioned more than once before. Once again, that is not
relevant for the intended usage of this proposed tag.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>