Alemanic & Swiss German

Gerard Meijssen gerardm at wiktionaryz.org
Wed Dec 6 10:59:48 CET 2006


Hoi,
Please understand, I am speaking at this moment very much from outside 
the IETF and have been clear that I speak from *my *perspective. The 
Google presentation gives you a reality check; only 15% of the content 
is tagged and often incorrectly. This means to me that the codes are not 
understood / adhered to. My problem is that in the Wikimedia Foundation 
we do not tag our content correctly and as it is on the Internet, we 
need to know what the IANA language tags and I have already matched 209 
out of 250 WMF project codes.

For WiktionaryZ we have as a first step created language portals for 
almost all languages that are in ISO-639-3. As a resource it is on the 
Internet and we do where applicable use the same conventions to indicate 
locales scripts etcetera. We can and want to add the other applicable 
codes that exist like MARC and IANA language subtags.

Doug Ewell schreef:
> Gerard Meijssen <gerardm at wiktionaryz dot org> wrote:
>
>> From my perspective the RFC 4646 that is seemingly inevitable is 
>> problematic because it only addresses on how it wants to be backwards 
>> compatible and is willing to sacrifice the easy understanding that a 
>> single list would bring with a hybrid system. A hybrid system where 
>> it is unclear to me how they want to link it to the ISO-639-3 content 
>> with the argument that they are not willing to address it until the 
>> standard is standard. From my perspective, when the ISO-639-3 is 
>> finally ratified, this list of how to link needs to be there in a 
>> finished form. By not having it at the time of ratification, it makes 
>> the IANA codes less then credible. By not having a period where these 
>> codes can be discussed, you will not have buy in.
>
> I need help understanding this:
>
> 1.  The "backward compatibility" of RFC 4646 and 4646bis is a major 
> design goal.  There are many systems that use RFC 3066 tags and 
> breaking compatibility with it, in particular by replacing 2-letter 
> subtags such as "nl" with the 3 letter-equivalent "nld", is a 
> non-starter.
This seems to be an excellent design goal from an engineering point of 
view. From a marketing point of view, I would say that it is a 
unmitigated disaster. With only 15% of the Internet content tagged at 
all and much of it wrong it looks like you are building on top on 
quicksand. I have already spend a lot of time trying to understand it 
and the way it works seems Byzantine to me.

For me the goal of a new Standard should be to improve the 15% to better 
than 75% but this 75% being correctly used code. This means make we need 
the code to be credible and relevant. There is a need for a Standard 
that adds value; WiktionaryZ for instance is likely to get content in 
the bnx language. This language has no presence on the Internet yet and 
it is unlikely to be found by search engines. For this it is really 
relevant to have a proper code because this will facilitate the 
recognition of these rare materials. If anything, success is in the long 
tail where you can make easy converts. For this the ISO-639-3 though not 
ratified is useful and therefore it has a credibility that the RFC 4646 
lacks.

http://www.ethnologue.com/show_language.asp?code=bnx
>
> 2.  No RFC that references a draft standard or RFC can be approved and 
> published.  The referenced standard must be an official standard.  
> That is one of the rules of the game.  ISO 639-3 is not yet an 
> approved, official standard, therefore draft-4646bis is not yet 
> eligible to become an RFC.  But we have certainly "addressed" ISO 
> 639-3 and discussed it and its code elements in detail.  I don't see 
> what the objection is.
There are two elements here.

    * There is the obvious need for identifying languages like
      Bangubangu which is not met.
    * There is the need to know how the ISO-639-3 codes will relate to
      the IANA language tags.

Having it only addressed from a technical point of view does not make it 
work. We need to make it easy for people to tag the correct language to 
their content.
>
> 3.  It should be very clear, from reading draft-4646bis, how it 
> intends to incorporate the ISO 639-3 code elements.
It should be clear I agree. I am afraid that it is not.
>
>> Yes, you cannot have it both ways. :(  In a presentation of Google it 
>> was suggested that the coding of content with language codes is so 
>> unreliable that it is practically useless. This seems to suggest to 
>> me that good marketing for the codes and clear benefits for using 
>> correct codes is needed. To me this lack of the effectiveness of 
>> these codes and the lack of good marketing makes the whole argument 
>> for backwards compatibility increasingly weak.
>
> Many people are not using RFC 3066 correctly, therefore we should 
> abandon backward compatibility and punish those who are using it 
> correctly?
Computers are clever, they can easily shift from one code to another. 
The people who currently use RFC 3066 correctly will continue to be 
recognised for the quality tagging that they do. The point though is 
that they are a small minority; a fraction of 15%. Is it not much better 
to make sure that we get to at least 75% correctly tagged content ?
>
>> Making nationalistic issues the primary argument for what makes a 
>> language ignores that many languages are spoken in many countries 
>> which refutes the argument that it is for the single countries 
>> involved to be the sole judge to decide on such languages.
>
> Is there a commentary on RFC 4646 or RFC 4646bis here?
Well, I can remember discussions where people insist zh being a language 
while it clearly is not from a linguistic point of view.

Thanks,
   Gerard


More information about the Ietf-languages mailing list