Registration of el-Latn language tag

Thu Sep 29 17:04:28 CEST 2005

Tex Texin scripsit:

> I understand the tradeoffs of a generative mechanism vs a registry.
> I don't understand why we are doing both.

The registry provides a single source of subtags as input to the
generative mechanism, rather than having to root around to find the
current state of five different international standards (ISO 639-1,
ISO 639-2, ISO 3166-1, ISO 15924, and UNSD M.49) on a variety of more
or less reliable, more or less complete web pages.  In the case of ISO
3166-1, it is also necessary to know which code elements were *formerly*
in use in order to impose stability on that labile standard.

(It also replaces the existing IANA registry, providing a list of
grandfathered tags that don't fit into the generative system and a
historical record of tag registrations that now can be generated.)

> What is a possible argument for not encoding any language-latn? 

Right now we are in a transitional period: this list is acting as if RFC
3066bis were already in effect, and registering anything that could be
freely generated by RFC 3066bis, provided it is not obviously stupid.
As has been shown at length, RFC 3066 can also generate stupid things
like haw-FR and nv-DK.

Since el-Latn can be generated by RFC 3066bis and is not obviously stupid,
there is no reason not to register it now.  Similarly, if someone requested
zh-Latn or jp-Latn, we should also process those.

> Personally, I disagree with your en-GB example. From a linguistic
> standpoint, maybe it is english from the UK.
> But most people would be quite happy to have their en-GB spell checker
> reject most of it.

I don't think anyone disagrees with that.  The notion of a spelling-checker
for Elizabethan English is unlikely: Shakespeare's name appears in some
20-odd spellings, and he himself never uses the most common of these,
"Shakespeare".

> If language tags are to be used on the Web, and supported by office tools,
> and to be recognizable by most users (given suitable expanded names and not
> subtags), they should have meanings that typical users can relate to.

I don't know what "relate to" means.  Most anglophones have probably never
heard of Vanuatu, but it has a proper country code nonetheless and appears
in lists of countries.  Similarly, Bislama is the common language of Vanuatu
and should have its proper language tag.  If most of us have not heard of
it, it doesn't matter; the people who *use* the language and need the tag
certainly have!

> As I have said many times, I understand the need of linguists, and we have
> SIL and more detailed standards with many more entries for their use.

The Ethnologue is not more *detailed* than ISO 639-2; it's more
comprehensive.  There's a difference.  If you have a document in a
language for which 639-2 doesn't code, you have two choices: use a
collective code like 'nai' (North American Indian languages) or use
no code at all.  Linguasphere OTOH (the basis for the draft ISO 639-6)
is more detailed.

> But we should have a clear set (or subset) of tags that most users can work
> with and get what they expect, and where it is unclear, we should be able to
> give them a definition. And their should be a reasonable precision to the
> definition.

I have no clue what this might mean.  What is the "definition" of English?
French?  Swahili?  Sotho?

> I should be able to examine some text and determine its tag. Looking at your
> example could you determine it was en-GB and not something else, perhaps not
> even in the english family?

*No you shouldn't*.  It's absurd to think that you can look at just any
texst and figure out what language it's in.  Yes, there are heuristics,
but they are extremely fallible.  Knowing what language a text is in is
best discovered by asking the author.

> We are no longer defining anything that is of use to typical users, 

It's of use -- indeed, necessary -- to the users it's of use to.  If we
took that approach to country tagging, we'd invoke the 80/20 rule and
provide tags for China, India, the U.S., Indonesia, Brazil, Pakistan,
Bangladesh, Russia, Nigeria, Japan, Mexico, the Philippines, Vietnam,
Germany, and Other.

-- 
John Cowan       http://www.ccil.org/~cowan        <jcowan at reutershealth.com>
        You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
        You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
                Clear all so!  `Tis a Jute.... (Finnegans Wake 16.5)