Progressing beyond borders-making subtags inclusive

Fri Jan 4 21:28:33 CET 2008

Hi -

> From: "Nicholas Shanks" <contact at nickshanks.com>
> To: <ietf-languages at iana.org>
> Sent: Friday, January 04, 2008 7:45 AM
> Subject: Progressing beyond borders-making subtags inclusive
> On 3 Jan 2008, at 19:33, Karen Broome wrote:
>
> > Is it simply up to the user to decide whether to use regional or  
> > variant tagging? Or should some guidelines be written to indicate a  
> > preference for variant tagging over regional tagging if both exist?
>
>
> I'd like to second the call for some guidelines to be widely  
> disseminated. I am a web developer and would like to see all of the  
> web tagged (correctly!) with language data.
>
> My own opinion is that using country codes to define dialects is  
> flawed. When borders change, Czechoslovakia splits in two, Germany  
> reunifies, etc, then all the old country codes become obsolete even  
> though linguistically nothing has changed. When populations are  
> displaced they take their language with them.

The discussion, preferably with specific proposals to remedy the problem,
belongs on ltru at ietf.org, not ietf-languages at iana.org

...
> The distinction between en-US and en-GB is mainly an  
> orthographic one.

There are also grammatical and lexical differences.
Trivial examples: verb agreement with collective nouns, and
the meaning of the verb "to table".

> I say this because en-US represents a cluster of  
> dialects and accents, with a unified orthography, and en-GB represents  
> a cluster of accents and dialects (some overlapping with en-US),

Could you give an example of such an overlap?  The divergence in
pronunciation was already marked in the 1700s.

> but a  different orthography. Thus en-GB/US is pretty useless to people who  
> are tagging audio data, but quite useful to those tagging written data.

This is not a problem.

> I believe that having a subtag registered is at present too difficult  
> (requirement for dictionaries!? what if it's mostly just an accent  
> with only phonemic changes relative to surrounding accents). A  
> relaxation of the barriers would lead to more de facto recognised  
> dialects being available to choose from.

I'm not able to figure out what you're trying to say here.

> As an example, things like the supposedly "British English" speech  
> synthesizer voices on my computer (which the OS processes using the  
> tag "en_GB" from the voice's property list) sound nothing like most of  
> the accents of the United Kingdom, they would be better marked as "en- 
> received" or similar.

This is not a tagging problem.  It's a complaint about a speech
synthesizer, and could be made for any language not tagged right
down to the level of some person's idiolect.

...
> I'm sure we can all agree on commonly recognised dialects for English,

I'd be surprised.  The "cowboy" dialects spoken by my relatives in
South Dakota differ from what the ones in Wyoming speak, and
neither sounds much like Bush-speak.  With variation seemingly on
the rise in US English, compiling an agreed list might be harder
than you think.

> as it is a first langauge for many people on this list, and familiar  
> for many others. For other languages compiling a list might involve  
> asking a scholar for suggestions.

That's not how ietf-languages at iana.org is supposed to work.
Rather, someone (anyone) who has a need of a subtag for a
particular dialect submits a registration request, the request is
discussed, and the Language Subtag Reviewer decides whether
to accept the registration.  

> It occurred to me while writing this that perhaps a good solution  
> would be to use country codes for written content that uses the  
> national orthography, and dialect tags when transcribing spoken  
> content or for audio data. You would only combine the two if you were  
> transcribing the speech of someone with that dialect into the  
> orthography of a country (maybe not the country of the speaker).

Interesting idea.  Discussion of such a proposal belongs on ltru at ietf.org,
not here.

Randy