The role of country codes.

John Cowan cowan at mercury.ccil.org
Thu May 29 09:55:15 CEST 2003


Disclaimer: not all RFC 3066 tags in this email are legal (yet).

Jon Hanna scripsit:

> I have failed
> to convince that the primary differences between en-US and en-IE aren't
> spelling,

I'm convinced.  I'm just not sure what a computer can do about such
differences as exist.  Speech classification, while important in theory,
is still a very marginal use of RFC 3066 tags.

> I still maintain that however, especially those examples of each
> of those dialects that are furthest from "received" en), I can't think of a
> single spelling difference between en-IE and en-GB,

The vowel in "f*ck".  :-)

> though my British
> colleague queries one of us about some Hibernicism nearly every day. I have
> a hard time understanding en-JM despite it generally using Old World
> spellings.

There is a continuum in Jamaica between en-jm (or even en-uk) and
cpe-jam; I have not much trouble with the former, and a great deal
of trouble with the latter.

> I had little difficulty understanding the Scottish dialect in
> _Trainspotters_ which I hear was released with subtitles in the US.

I didn't see the movie, but I suppose the characters are speaking Scots
(sco) rather than en-uk-scotland.  Some anglophones can passively
understand Scots, others not.

> Vocabulary and syntax often varies strongly with those language-differences
> currently identified by reference to the country in which they occur (there
> are syntax differences in "stronger" forms of Hiberno-English as well which
> borrow from Irish, however while I might respond to an invitation to join a
> colleague in a quick pint with "yeah, I've a bit of a thirst on me" it isn't
> syntax I would normally use in an email).

Well, I am of Irish origin, but I hardly imagine that *any* native speaker
of en-* would really fail to understand that, even if they could not
produce it in their own dialects.

> In current usage the country codes identify both the orthographic and other
> differences, and it works well because they are pretty much where both of
> these differences should be with respect to the primary subtag. With the
> introduction of script information into language codes the double-duty of
> the country codes no longer works well. The obvious priority is to place the
> differences in vocabulary and syntax before the script information and the
> orthographic differences after, I don't think this translates well to any
> suggested encoding.

It does not.  (Or in en-us, "Right.")  I think the burden of persuasion
is on you to show that the needs *of IT* (remembering that RFCs are
information technology standards) are advanced by such a coding system.

-- 
John Cowan      jcowan at reutershealth.com        http://www.ccil.org/~cowan
        Is it not written, "That which is written, is written"?


More information about the Ietf-languages mailing list