language tag structure

Peter Constable petercon at microsoft.com
Mon Jan 17 20:19:06 CET 2005


> From: JFC (Jefsey) Morfin [mailto:jefsey at jefsey.com]

> >I don't see the relevance of menus: we're not tagging genres of
documents.
> >Nor do I see the relevance of barcodes or RFIDs -- I suppose these
> >technologies could be used to encode linguistic texts, but even so we
want
> >describe the linguistic variety of the text, not the encoding format.
> 
> The script _is_ an encoding format. Together with founts, etc.

Representation of linguistic expressions in terms of the characters of
some script is one thing; representation of characters in terms of some
radio-transmission protocol is quite another. Language tags should
distinguish between different written representations for a linguistic
expression, but not between radio-transmission protocols. 

The term "encoding" is *not* conventionally used to refer to the script
used for writing texts, and is certainly not what I was referring to
when I said we don't want to describe encoding format.


> >If you think people are going to tag content to distinguish down to
the
> >level of individual lexical innovations, I'd say you're dreaming.
> 
> Great. I just documented you a case where people need and want that
> tagging. 

You haven't documented a need. You have asserted there is a need. You
have not given any usage scenario describing how distinctions of the
kind you're suggesting would be used.


> OK. I understand that you see no other flaw in my approach than
finding a
> correct wording for an extended script concept.

!! You cannot obtain consensus by putting words in others' mouths. See
no flaw? I just finished objecting to several things that you say you
want to have encompassed by "script". It is a flaw, for instance, to mix
a data category like file or encoding format in with a data category for
linguistic attributes.

Also, I agree with John that we are not documenting language
authorities, and that companies such as Microsoft are not language
authorities. Perhaps your intent is something other than what "language
authority" suggests to us in English -- an agency that has been
identified at a societal or governmental level as having authority to
define policies, conventions and best practices regarding language
definition and usage.

On the other hand, if your intent is simply to describe a conventional
usage defined within some domain -- e.g. spelling and lexical
conventions for French used within ISO -- then I think that's an
acceptable thing to include in a language tag. But I would not refer to
that as "language authority"; it's a domain of usage.


> We also have a lot hieroglyphs being accepted in the day to day life.
How
> do you want the "I [hart] NY" or the smileys to be read by a web
service, a
> scanner, etc. if you do not document the forms.

An icon of a heart is not part of the written form of English; it is an
icon with a metaphoric meaning. (And in the case of that particular
metaphor, one that spans many cultures and languages; e.g. one could
also have written "J'[heart] Paris".) The use of icons within text is
not generally conventionalized (though the icons themselves may be) --
there are not established conventions that can be given identities by
which (say) an iconic image of a telephone handset can be inserted
within text to mean "call by telephone", or images of an automobile can
be inserted within text to mean "automobile" (or "sedan" or "sports-car"
or "SUV"). We might want to have a subtag along the lines of "rebus" to
tell us that the text contains some icons in lieu of words; e.g.
"en-rebus" for "I [heart] NY"; but I can't see making finer distinctions
than that unless there are identifiable conventions.

Note that a subtag like "rebus" would not represent anything new that
could not be accommodated by RFC 3066bis.


Peter Constable



More information about the Ietf-languages mailing list