Language attributes- what are they?

Peter Constable petercon at microsoft.com
Sat Jan 1 20:55:18 CET 2005


> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Tex Texin


> However, I also believe that a language tag implies the sort order
used
> within
> the content it represents.

I certainly have no problem saying that one can make a probabalistic
inferences about things like sort order or perhaps date formats from a
language tag (probabalistic implying that confidence in the inference
may be variable; e.g. you may not feal very confident about infering
Spanish modern sort order from "es".)

What I'm concerned about, though, is whether it is appropriate to add
explicit qualifiers to a language tag to indicate sort order when the
statistical inference would be wrong.

For instance, if you see "en" on text, you will probably guess that it
uses the common English writing system, and in the vast majority of
cases you'd be correct. But if I create a document that has English in
phonetic transcription, I'd want to tag that content using a tag meaning
"English in phonetic transcription" (something like "en-Lipa") so that
you wouldn't make the wrong inference. 

Now, turning to sorting, if you see "en" on text that contains a list,
"able, apple, baker, boggle,..." you will probably guess that it uses
English sort order, and in the vast majority of cases you'd be correct. 

The questions now are these:

- Are there common enough usage scenarios in which there may be a
question about the sort order of static content that we should have a
general-purpose scheme to indicate sort order as a metadata attribute on
content?

- If so, should that scheme be incorporated as part of our language-tag
scheme?

I have doubts regarding the first question; but even if we were to
answer that in the affirmative, I'd have even bigger doubts regarding
the second.


That is the issue I am focusing on, and the basis that I say sort order
is out of scope for language tags. Again, I have no problem with making
probabalistic inferences about sort order from a language tag.



> A language tag labeling a document or more generally text content
should
> imply
> the language and all language attributes that the AUTHOR uses to
create
> the
> content.

Agreed, wrt *author*. As for "all language attributes", that is subject
to what is meant.



> I would like to avoid if possible discussion of locales, and focus on
what
> the
> language identifier entails.
> I think that implies not discussing user interface, and requires some
> presumption of pre-built content and the choice of language tag to be
used
> to
> label that content.

I think that's an appropriate set of constraints.

 
> Wouldn't it be surprising for a non-swedish sort order to be used with
> content
> that was labeled as Swedish?
> (Regardless of who the content is given to...)

Certainly it would be surprising. Now, I'm not sure why inference you
intend to be taken from this rhetorical question:

a) "Of course this is unexpected; therefore we are generally safe in
inferring Swedish sort order when the content is labeled as Swedish." I
have no qualms with that.

b) "Yes, it would be unexpected; therefore we need a way to indicate the
exception using some metadata attribute."  My questions for this
situation, then, are those I gave above: do we really need a scheme to
indicate sort order in such situations, and if so should they be part of
language tags? The conclusion I've come to is no, at least to the second
part.



Peter Constable


More information about the Ietf-languages mailing list