<html>
<head>
<style>
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
FONT-SIZE: 10pt;
FONT-FAMILY:Tahoma
}
</style>
</head>
<body class='hmmessage'>
O.k. but of course many people's dialects are mixed or multiple:<BR><A href="http://www4.uwm.edu/FLL/linguistics/dialect/maps.html">http://www4.uwm.edu/FLL/linguistics/dialect/maps.html</A><BR>
<BR>
(mine is; thus yielding multiple pronunciations of some words, which might or might not be register-dependent)<BR>
<BR>
But yes, sure, the more subtags the better because storage is getting better; we can store all this information. But sometimes it's impossible to identify a particular dialect.<BR>
<BR>
--C. E. Whitehead<BR>
<A href="mailto:cewcathar@hotmail.com">cewcathar@hotmail.com</A><BR>
<BR>
<BR><BR>> <BR>> On 3 Jan 2008, at 19:33, Karen Broome wrote:<BR>> <BR>> > Is it simply up to the user to decide whether to use regional or <BR>> > variant tagging? Or should some guidelines be written to indicate a <BR>> > preference for variant tagging over regional tagging if both exist?<BR>> <BR>> <BR>> I'd like to second the call for some guidelines to be widely <BR>> disseminated. I am a web developer and would like to see all of the <BR>> web tagged (correctly!) with language data.<BR>> <BR>> My own opinion is that using country codes to define dialects is <BR>> flawed. When borders change, Czechoslovakia splits in two, Germany <BR>> reunifies, etc, then all the old country codes become obsolete even <BR>> though linguistically nothing has changed. When populations are <BR>> displaced they take their language with them.<BR>> <BR>> I feel that all dialects should have their own subtags, not just the <BR>> ones that partizan individuals propose. As a great example, there's a <BR>> subtag for en-scouse but not one for yorkshire, geordie or brummie, <BR>> because the guy that submitted the scouse request has a vested <BR>> interest in his own dialect, and nobody has bothered to register the <BR>> others. The distinction between en-US and en-GB is mainly an <BR>> orthographic one. I say this because en-US represents a cluster of <BR>> dialects and accents, with a unified orthography, and en-GB represents <BR>> a cluster of accents and dialects (some overlapping with en-US), but a <BR>> different orthography. Thus en-GB/US is pretty useless to people who <BR>> are tagging audio data, but quite useful to those tagging written data.<BR>> I believe that having a subtag registered is at present too difficult <BR>> (requirement for dictionaries!? what if it's mostly just an accent <BR>> with only phonemic changes relative to surrounding accents). A <BR>> relaxation of the barriers would lead to more de facto recognised <BR>> dialects being available to choose from.<BR>> <BR>> As an example, things like the supposedly "British English" speech <BR>> synthesizer voices on my computer (which the OS processes using the <BR>> tag "en_GB" from the voice's property list) sound nothing like most of <BR>> the accents of the United Kingdom, they would be better marked as "en- <BR>> received" or similar.<BR>> <BR>> Consider if you will a speech synthesizer trying to render a website <BR>> with the following:<BR>> <dialog><BR>> <dt>George Bush<BR>> <dd lang="en-US-cowboy">Now that's what I call a stonkin' good supper!<BR>> <dt>British Ambassador<BR>> <dd lang="en-GB-received">Yes, indeed sir. That would appear to be the <BR>> case.<BR>> </dialog><BR>> <BR>> The synth has available half a dozen male voices variously described <BR>> as "en-US" and "en-GB" it would probably not render the dialogue <BR>> closely to the author's intentions, but if those voice descriptions <BR>> could be "en-general", "en-cowboy", "en-drawl", "en-received", "en- <BR>> westcountry" and "en-estuary", then the synth would have far more <BR>> freedom to select an appropriate voice to use.<BR>> <BR>> I'm sure we can all agree on commonly recognised dialects for English, <BR>> as it is a first langauge for many people on this list, and familiar <BR>> for many others. For other languages compiling a list might involve <BR>> asking a scholar for suggestions.<BR>> <BR>> <BR>> Footnote:<BR>> It occurred to me while writing this that perhaps a good solution <BR>> would be to use country codes for written content that uses the <BR>> national orthography, and dialect tags when transcribing spoken <BR>> content or for audio data. You would only combine the two if you were <BR>> transcribing the speech of someone with that dialect into the <BR>> orthography of a country (maybe not the country of the speaker).<BR>> <BR>> - Nicholas Shanks.<BR><BR></body>
</html>