Region subtags and orthographic variants (was: Re: registration requests re Portuguese)

Doug Ewell doug at
Wed Apr 15 21:30:52 CEST 2015

Yury <yury dot tarasievich at gmail dot com> wrote:

>> My objections have been to the continued blanket assertions that a
>> region subtag is *always* "unnecessary" or "extraneous" with a variant
>> subtag for orthography. "pt-PT-ao1990" is the right tag if the
>> dialectical conventions of Portugal are also part of the equation.
> Any lingustic background for *this* assertion, then? The dialectical
> conventions of anything are (a) never confined to the borders of any
> ISO region sufficiently non-isolated (like, not a lonely island or
> mountain/river valley in Equat. Africa or New Guinea), (b) are
> typically occupying much smaller area than 'country-level' ISO regions
> -- like an area betwen several main watercourses.
> So you would want something like '-adamscounty' or '-rivervalley' for
> situations like dialectal variation of standard orthography.
> American usus of 'color vs. colour' kind does not comprise a dialect
> /per se/, that's why en_US is sufficient.
> Oh, and by the way, what's the definition of 'dialect' which is
> actually referred to here and in rfc5646?

First, I apologize if you feel you're not being heard. I'm sure it's a
fault in my understanding.

So, to answer your last question first, and work upward from there:

"Dialect" generally means a language variety associated with a
particular region or social class. It explicitly does not imply that the
variety is sub-standard (a common man-on-the-street definition). It's a
delicate linguistic term and I note with relief that RFC 5646 doesn't
try to define it. Someone like SIL would be a suitable resource.

When I've used "dialect" in this thread, I mean it as shorthand for "any
sort of regional variation." I apologize for my lack of precision here.

"Color" vs. "colour" is one type of difference I'm trying to express
here. The Otis elevator examples earlier this week fall into this
category too. Most of these differences don't constitute real
"dialects." Most regional differences found in English are much less
significant than those found in other major languages; it's been said
that English doesn't have any true "dialects" until you get to the
pidgins and creoles.

But if there were a true dialect, or even a noticeable regional
difference of some sort, specific to Adams County, Colorado (where
Thornton is) or River Valley Village (a neighborhood of Thornton), then
no, a region subtag wouldn't be precise enough. This would also be true
for content featuring "y'all" or "wicked awesome," which are associated
with somewhat larger areas of the United States. In those cases, we'd
need to register a variant subtag. That's not because region subtags
can't denote true dialects, but because of the granularity of these

Region subtags serve the same general purpose as variant subtags, but
specifically for varieties that are commonly associated with a
country-sized (or larger) region. They're often explained in pairs (or
triples, or more) to contrast usage in different countries. A few
well-known contrasting pairs are:

en-US vs. en-GB
fr-FR vs. fr-CA
pt-PT vs. pt-BR

This model has been in use for decades, long before there was a BCP 47
or even an RFC 1766. It's well understood that the model is not perfect,
because language usage does not follow political borders perfectly and
we do not live in the world of Henry Higgins. If this model had never
existed, BCP 47 would have used variants to denote all regional
varieties of this sort, not just the ones more narrowly associated with
Texas or Boston or Thornton.

So getting back to my assertion about "pt-PT-ao1990", then:

1. Political borders notwithstanding, there are well-known,
well-understood differences between Portuguese as typically used in
Portugal and Portuguese as typically used in Brazil. "Tu" vs. "você" is
an example. Content that exhibits one variety or the other might be
tagged as "pt-PT" or "pt-BR".

2. There are differences between the Portuguese orthography laid down in
AO1990 and previous standards, such as the 1945 reform. (One thing
everyone can agree on!) Content that exhibits one variety or the other
might be tagged as "pt-ao1990" or "pt-colb1945".

3. Note carefully that (1) and (2) are independent of each other.

4. If someone wanted to tag content that used "tu" instead of "você",
and exhibited other preferences commonly associated with European
Portuguese instead of South American Portuguese, and ALSO wanted to
specify the 1990 orthography, then the tag "pt-PT-ao1990" would express
that, and would not be redundant.

5. But I agree that the "-PT" in "pt-PT-ao1990" would be inappropriate
if its intended meaning were "this speaker happens to live in Portugal"
or "the Portuguese government happens to have ratified the 1990 accord."

Does all of this make sense? Any of it?

Doug Ewell | | Thornton, CO 🇺🇸

More information about the Ietf-languages mailing list