Alemanic & Swiss German
dewell at adelphia.net
Wed Dec 6 15:58:11 CET 2006
Gerard Meijssen <gerardm at wiktionaryz dot org> wrote:
> The Google presentation gives you a reality check; only 15% of the
> content is tagged and often incorrectly. This means to me that the
> codes are not understood / adhered to. My problem is that in the
> Wikimedia Foundation we do not tag our content correctly and as it is
> on the Internet, we need to know what the IANA language tags and I
> have already matched 209 out of 250 WMF project codes.
I have tried to explain this at least twice already:
There is a revision of RFC 4646 underway that will incorporate ISO 639-3
code elements. It is likely to be approved some time early in 2007. At
that time you can use ISO 639-3 code elements and be conformant with the
IETF specification as well.
There is no "conflict" between ISO 639-3 and the "IANA language tags"
except that the latter cannot reference a draft standard.
> For WiktionaryZ we have as a first step created language portals for
> almost all languages that are in ISO-639-3. As a resource it is on the
> Internet and we do where applicable use the same conventions to
> indicate locales scripts etcetera. We can and want to add the other
> applicable codes that exist like MARC and IANA language subtags.
Since you're interested in being able to tag your content conformantly,
why don't you join the LTRU Working Group and follow the progress of RFC
>> 1. The "backward compatibility" of RFC 4646 and 4646bis is a major
>> design goal. There are many systems that use RFC 3066 tags and
>> breaking compatibility with it, in particular by replacing 2-letter
>> subtags such as "nl" with the 3 letter-equivalent "nld", is a
> This seems to be an excellent design goal from an engineering point of
Well, it is the Internet *Engineering* Task Force.
> From a marketing point of view, I would say that it is a unmitigated
> disaster. With only 15% of the Internet content tagged at all and much
> of it wrong it looks like you are building on top on quicksand.
I do not follow your line of reasoning that a marketing "disaster"
implies a defect in the engineering.
> I have already spend a lot of time trying to understand it and the way
> it works seems Byzantine to me.
Please ask us questions about details of how to implement RFC 4646.
That is likely to give you better results and less frustration than
badmouthing the system.
> For me the goal of a new Standard should be to improve the 15% to
> better than 75% but this 75% being correctly used code.
Ask anyone who has worked with Internet standards -- hardware or
software -- and they will tell you they have little or no control over
whether their standards are implemented correctly. Only market forces
can do that. The IETF is explicitly an engineering group, not an
industry consortium or a marketing advocacy group.
That said, if you see any engineering flaws in RFC 4646 that you feel
contribute to low acceptance in the market, please state them.
> This means make we need the code to be credible and relevant.
How do you see RFC 4646 as not credible or not relevant? (Other than
not incorporating a draft standard, that is.)
> There is a need for a Standard that adds value; WiktionaryZ for
> instance is likely to get content in the bnx language. This language
> has no presence on the Internet yet and it is unlikely to be found by
> search engines. For this it is really relevant to have a proper code
> because this will facilitate the recognition of these rare materials.
> If anything, success is in the long tail where you can make easy
> converts. For this the ISO-639-3 though not ratified is useful and
> therefore it has a credibility that the RFC 4646 lacks.
When RFC 4646bis is published, you will be able to conform to both at
once. If you check the draft Registry for RFC 4646bis (page 114), you
will see than "bnx" and thousands of others are included. (Warning:
this is a large diocument, over 750,00 bytes.)
> * There is the obvious need for identifying languages like
> Bangubangu which is not met.
Is this the first time you have had to wait for a specification to be
> * There is the need to know how the ISO-639-3 codes will relate
> to the IANA language tags.
They will be primary language subtags, except for those ISO 639-3 codes
which are encompassed by an ISO 639-3 macrolanguage; those will be
"extended language subtags" and will follow the macrolanguage; thus
"qu-qxc" for Chincha Quechua. Did that make sense? It should if you
have read RFC 4646 and ISO 639-3 and understand how they work.
> Having it only addressed from a technical point of view does not make
> it work. We need to make it easy for people to tag the correct
> language to their content.
Tools will help with that. Market forces will help. We are the ones
designing the protocol and while some of us have also built tools, we
are not in charge of marketing them.
You might see some familiar corporate domain names on this list, such as
Microsoft and Google and Yahoo!, but the people on this list who work at
those companies are engineers trying to solve engineering problems.
Steve Ballmer is not on this list.
> Computers are clever, they can easily shift from one code to another.
Tagged content does not automatically get retagged when a code gets
changed; someone has to retag it. That is why compatibility is
> Well, I can remember discussions where people insist zh being a
> language while it clearly is not from a linguistic point of view.
ISO 639-3, which you hold in high regard, considers Chinese to be
simultaneously (a) a language and (b) an umbrella term for a group of
more specific languages. That is essentially how RFC 4646bis will treat
it. If you are not happy with this situation, you can use the ISO 639-3
individual codes directly (or use private-use tags, or invent your own
system). But you cannot continue to say it is not clear how RFC 4646bis
will handle this; I and others have explained it repeatedly.
Please join the LTRU Working Group and continue this discussion there.
This list is for discussion of proposals to register new subtags, not
for extended debates about the merits of the RFC 4646 system.
Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
More information about the Ietf-languages