The role of country codes.
John Cowan
cowan at mercury.ccil.org
Thu May 29 09:55:15 CEST 2003
Disclaimer: not all RFC 3066 tags in this email are legal (yet).
Jon Hanna scripsit:
> I have failed
> to convince that the primary differences between en-US and en-IE aren't
> spelling,
I'm convinced. I'm just not sure what a computer can do about such
differences as exist. Speech classification, while important in theory,
is still a very marginal use of RFC 3066 tags.
> I still maintain that however, especially those examples of each
> of those dialects that are furthest from "received" en), I can't think of a
> single spelling difference between en-IE and en-GB,
The vowel in "f*ck". :-)
> though my British
> colleague queries one of us about some Hibernicism nearly every day. I have
> a hard time understanding en-JM despite it generally using Old World
> spellings.
There is a continuum in Jamaica between en-jm (or even en-uk) and
cpe-jam; I have not much trouble with the former, and a great deal
of trouble with the latter.
> I had little difficulty understanding the Scottish dialect in
> _Trainspotters_ which I hear was released with subtitles in the US.
I didn't see the movie, but I suppose the characters are speaking Scots
(sco) rather than en-uk-scotland. Some anglophones can passively
understand Scots, others not.
> Vocabulary and syntax often varies strongly with those language-differences
> currently identified by reference to the country in which they occur (there
> are syntax differences in "stronger" forms of Hiberno-English as well which
> borrow from Irish, however while I might respond to an invitation to join a
> colleague in a quick pint with "yeah, I've a bit of a thirst on me" it isn't
> syntax I would normally use in an email).
Well, I am of Irish origin, but I hardly imagine that *any* native speaker
of en-* would really fail to understand that, even if they could not
produce it in their own dialects.
> In current usage the country codes identify both the orthographic and other
> differences, and it works well because they are pretty much where both of
> these differences should be with respect to the primary subtag. With the
> introduction of script information into language codes the double-duty of
> the country codes no longer works well. The obvious priority is to place the
> differences in vocabulary and syntax before the script information and the
> orthographic differences after, I don't think this translates well to any
> suggested encoding.
It does not. (Or in en-us, "Right.") I think the burden of persuasion
is on you to show that the needs *of IT* (remembering that RFCs are
information technology standards) are advanced by such a coding system.
--
John Cowan jcowan at reutershealth.com http://www.ccil.org/~cowan
Is it not written, "That which is written, is written"?
More information about the Ietf-languages
mailing list