New file format uses ISO 639-2 alpha-3 plus TLDs, ignores BCP 47

Doug Ewell doug at
Tue Jul 12 20:29:51 CEST 2016

Today on the IETF "Daily Dose" page, I noticed a draft for Matroska
(draft-lhomme-cellar-matroska-00), a proposed new multimedia container
format based on Extensible Binary Meta Language (EBML), also in draft
form. Both are individual submissions, associated with the cellar WG.

In this 220-page draft, I found the following:

> 6.2.1.  Language Codes
> Language codes can be either the 3 letters bibliographic ISO-639-2
> [13] form (like "fre" for french), or such a language code followed
> by a dash and a country code for specialities in languages (like
> "fre-ca" for Canadian French).  Country codes are the same as used
> for internet domains [14].

I wonder if there was some particular reason for choosing this hybrid
approach instead of simply referencing BCP 47 and applying constraints
as desired (e.g. "no variants or extlangs"). Among other benefits, using
BCP 47 would provide access to many more languages (more than 8000
instead of 485), avoid the GB/UK problem, and promote the objective of
IETF specifications referencing other IETF specifications.

What makes this all a bit strange is that EBML itself, written by two of
the three authors of Matroska, does specify the use of RFC 5646 for
<documentation> sub-elements.

Doug Ewell | Thornton, CO, US |

More information about the Ietf-languages mailing list