Preferred Values for Irregular Tags

Mark Davis ☕ mark at
Wed Jan 20 19:36:43 CET 2010

The grandfathered tags behave differently than anything else. All the other
tags are productive: you can combine them in different ways with expected
results, while the grandfathered tags are atomic; you can't combine one of
them with, say, a region. Moreover, you can write APIs to deal with that
structure, returning the base language code, script code, etc. The
uniformity of program APIs is of extreme importance when you are dealing
with massive amounts of program code.

Of course we could parse en-GB-oed. But it doesn't fit into the regular ABNF
production rules, and so doesn't work well in APIs.

Out of the billions of possible language tags (without even counting
combinations using variants), there are *literally* only a handful of
grandfathered codes (that cannot be correctly mapped to regular language
tags). If we can fix these few, then there is nothing standing in the way of
everyone being able to use all of them effectively.

That is, for existing data, we (and others like us) would convert tags like
en-GB-oed on input to regular tags; then the information is still
accessible. Otherwise our only choices are to dump the data or map to the
'closest' code.


On Wed, Jan 20, 2010 at 10:23, Michael Everson <everson at> wrote:

> On 20 Jan 2010, at 01:58, Mark Davis ☕ wrote:
> > Why do this? Well, at Google we convert anything that has an
> > irregular format to a regular format.
> Which means what? Your programmers aren't able to identify and parse
> the string "en-GB-oed"? Guess what, Mark... that has been in use since
> 2003. There's data out there in it.
> Please explain what it is that you are up to.
> Michael Everson *
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at
-------------- next part --------------
An HTML attachment was scrubbed...

More information about the Ietf-languages mailing list