Here's what I have to say aboutthat?

Jon Hanna jon at spin.ie
Mon May 26 18:16:03 CEST 2003


> > Where we don't have
> > consensus is on how to proceed.
>
> As Martin Duesrt indicated, we're close enough.

Of course, it's not like my continuing objections make for some sort of veto
(and if anyone should have that right I'm the last person, I'm pretty much
the lUser wannabe on this list). Still, if nothing else arguing against me
will make you more sure that you're doing the right thing.

> > The majority opinion seems to favour altering the use of 3066 (some
> > point to registrations like de-1901 as evidence that this isn't
> > really altering 3066 at all).
>
> I'd go further: sanctioned use of tags like "en-US" vs. "en-GB", where the
> most significant difference is one of spelling, establishes that
> this isn't
> altering the intent of 3066 at all.

I still think it's the sidewalks, diapers and cookies and more bizarre terms
that I can't remember (because my mental lexicon has nowhere to put them)
that confuse us more than the American fondness of the letter zed (which
they call zee). Subtler differences (we have frontiers, so do Americans, but
they also have "the" frontier) are where dialect distinctions are
particularly worth encoding.

> > I maintain that script is orthogonal to language.
>
> Language and country are just as much if not more orthogonal, but we
> regularly combine those in a single tag.

"en-IE" tells you nothing about where I am, though you could make reasonable
inferences about where I learnt the English language. Whether I was in
Dublin, Ireland or Dublin, Ohio my language would remain that indicated by
"en-IE". Indeed the form of English I use comes from Northern Ireland which
(political differences about whether it should be or not aside) is part of
the UK, but where Hiberno-English differs from British English I generally
use the Hiberno-English form, so the -IE is not strictly a matter of "my"
country at all.

> > I further foresee a need for other information about stuff that goes
> > hand-in-hand with the concept of "locale" (a problematic word, but
> > I'll forgo spending 10 paragraphs debating what it means, and for
> > now define it as "a representation of conventions used when
> > rendering data for human consumption, when or parsing human-readable
> > documents").
>
> Which has been discussed here and elsewhere, and I think there is complete
> agreement that there are many things of this sort that we do
> *not* want RFC
> 3066 extended to accommodate.

Yes, dreadful notion. Unfortunately in the absence of better mechanisms 3066
is and will continue to be used to guess at this. I'd rather we give people
a way of doing so that doesn't abuse the rest of 3066. Still that's a
different matter and perhaps we should fork the discussion if we want to go
into it.

> > My back-of-an-envelope strawman is to define a new locale specifier
> > in which the locale of this email is "en-IE.latn".
>
> > 2. While not backward-compatible with 3066, 3066 is forwards
> > compatible with it.
>
> I gather your suggesting that the first portion in your proposed tag up to
> "." is taken from RFC3066. I'm curious: what is the practical difference
> between "en-IE" and "en" -- what should my software do differently?

It should pronounce "HTTP" as haitch-tee-tee-pee instead of
aitch-tee-tee-pee :)

> - A tag such as "az-AZ.latn" is not all that different from "az-AZ-latn".
>
> - The mechanisms you wanted for easily determining language, script, and
> language&script that would work on a tag such as "en-IE.latn" are only
> trivially different from mechanisms that would make such
> determination from
> tags such as "en-IE-latn" or "en-latn-IE".

Well, I would be happier with en-IE-latn.

> - One of Peter Edberg's suggestions for extending RFC 3066 did include a
> distinct delimiter in the syntax to separate the purely-langage portion
> from the written-expression portion, and it seems clear that such
> conventions could be handled as an extension to RFC 3066 rather than
> creating a new, distinct system (and, for reasons I gave above,
> should be).

IIRC I said I like this version, and would drop my remaining objections in
favour of it. If I didn't I certainly intended to.

> - Some have argued here that the sequencing of elements within a
> tag should
> put script before country, i.e. "en-latn-IE" (or if you contend the need
> for distinct delimters, "en.latn.IE" or some such) since differences in
> orthographic system / script are far more important than are
> differences in
> spelling conventions or vocabulary and other dialect distinctions.

I would argue in favour of en-IE-latn rather than en-latn-IE precisely
because script differences are more important than dialect. Script
differences are sometimes more important than language! For example if you
are deciding on font glyphs to use for graphical output you can be largely
language-agnostic, but must know the script. In this case you could parse
from the right and ignore what is to the left of "latn", just as in cases
where you cared only about the spoken language you would parse from the left
and ignore everything after "en" or "en-IE" depending on what degree of
precision you needed.

> > 4. Allows for the separation of responsibility - the management of
> > language tags would not necessarily be done by the same people as
> > the management of script tags, or any other features added by the
> > extensibility mechanism provided. This is of importance both as a
> > matter of scalability and also because some people might simply only
> > find some of those matters to be interesting.
>
> On the contrary, I think we'd probably find largely the same group of
> people involved in both.

No doubt, but within that I would be surprised if we didn't see some who
were more "script people" and some who were more "language". If the further
part of my proposal - to allow other "locale" information to be defined -
were also part of this then this would be even more the case.



More information about the Ietf-languages mailing list