Progressing beyond borders�making subtags inclusive

John Cowan cowan at ccil.org
Fri Jan 4 20:59:42 CET 2008


Nicholas Shanks scripsit:

> I feel that all dialects should have their own subtags, not just the  
> ones that partizan individuals propose. As a great example, there's a  
> subtag for en-scouse but not one for yorkshire, geordie or brummie,  
> because the guy that submitted the scouse request has a vested  
> interest in his own dialect,

Actually not:  I speak pure old classical East Coast American, just what
you'd expect from someone born almost fifty years ago just west of the
New York City dialect border.  There was good documentation for en-scouse,
so I registered it.

> en-US represents a cluster of dialects and accents, with a unified
> orthography, and en-GB represents a cluster of accents and dialects
> (some overlapping with en-US), but a different orthography. Thus
> en-GB/US is pretty useless to people who are tagging audio data,
> but quite useful to those tagging written data.

Although of course there are many individual isoglosses that cross the
Atlantic, I think you greatly overstate the case.  I don't think there
are any U.K. varieties that even a half-trained person could mistake for
American ones, and certainly not vice versa.  The closest pair is probably
Eastern New England and East Anglia, and even that is not very close.

> I believe that having a subtag registered is at present too difficult  
> (requirement for dictionaries!? what if it's mostly just an accent  
> with only phonemic changes relative to surrounding accents).   

Dictionaries are not required; they are just an example of the kind
of documentation that's acceptable.  We need to be sure, when we are
registering a tag, that it is not substantially identical with some
existing tag, that's all, so there must exist documentation of the
variety being tagged.

> As an example, things like the supposedly "British English" speech  
> synthesizer voices on my computer (which the OS processes using the  
> tag "en_GB" from the voice's property list) sound nothing like most of  
> the accents of the United Kingdom,

I doubt the en-US accent is much like the vast majority of U.S. speakers
nowadays either.

> they would be better marked as "en-received" or similar.

Register it, then.

> The synth has available half a dozen male voices variously described  
> as "en-US" and "en-GB" it would probably not render the dialogue  
> closely to the author's intentions, but if those voice descriptions  
> could be "en-general", "en-cowboy", "en-drawl", "en-received", "en- 
> westcountry" and "en-estuary", then the synth would have far more  
> freedom to select an appropriate voice to use.

If the document was tagged "en-US-dixie", then a synth with an "en-US"
voice (probably so-called General American) would at least get it
approximately right (assuming, that is, that it doesn't speak Hawkingese,
which sounds American to the English and pseudo-Swedish to the Americans).

> I'm sure we can all agree on commonly recognised dialects for English,  

I wish I shared your certitude.

-- 
Mark Twain on Cecil Rhodes:                    John Cowan
I admire him, I freely admit it,               http://www.ccil.org/~cowan
and when his time comes I shall                cowan at ccil.org
buy a piece of the rope for a keepsake.


More information about the Ietf-languages mailing list