deprecating www as language code

Doug Ewell doug at ewellic.org
Fri Apr 8 23:57:36 CEST 2011


Casey Brown <lists at caseybrown dot org> wrote:

> I'm not exactly sure why this request turned into an attack on
> Wikipedia.  The mention of Wikipedia was just an example case to
> illustrate the possible issues.[0]

I'm not attacking anyone or anything.  It's a known fact that several
Wikipedias are referenced through non-standard language codes.  Try
http://simple.wikipedia.org/wiki/Main_Page for one example; there is no
BCP 47 language subtag called 'simple'.

Melinda didn't say so, but I got the impression that the request to
change code element 'www' originated with someone from Wikipedia.

> If you want to keep it around just for the sake of keeping it around,
> that seems pretty stupid.  ISO is supposed to make things easier
> through standardization, but there's definitely some degree of common
> sense involved.  As someone else mentioned here, when someone sees
> "www", are they going to think Wawa or are they going to think
> "worldwide web"?  A Wawa speaker probably wouldn't even assume the
> code were "www".

Deprecated subtags (in ISO 639, code elements) are "kept around" not
because it's fun to amass useless clutter, but because it's a bad idea
to invalidate existing tagged data.  Someone, somewhere, might have used
ISO 639 or BCP 47 'www' to identify some Wawa data, or to offer users an
option to select Wawa data.  Pulling the rug out from under these users
by invalidating their legitimate use of the standard would not only be
evil for the Wawa users, but would discourage everyone else from being
able to trust the stability of the standard.

The problem is not in someone seeing 'www' in a vacuum and thinking
"Wawa" versus "World Wide Web."  The problem, as Kent said, is in
creating a Web architecture where one of the subdomains *might* be a
language code, as in "en.wikipedia.org", or it *might* be something else
entirely, as in "www.wikipedia.org".  That is symbol overloading, and it
is not good engineering practice; it forces the architect to assume
there will never, ever be a language code of 'www', not a very safe
assumption with over 7,800 language codes.

Wawa speakers, like speakers of any other language, shouldn't be
expected to know the magic two- or three-level code for their language. 
This is a matter for engineers.  They are the ones who dropped the ball
here.

> We shouldn't be attacking websites for still supporting the use of
> "www" as equal to the "naked" URL, which has been around for the whole
> life of the internet.  I don't mind "blacklisting" ftp too, but it's
> not as necessary as "www".  www is really the only thing that would be
> a conflict with every single website that uses lang.foo.tld.

I don't care if they want to make "www.wikipedia.org" equivalent to
"wikipedia.org".  Many, probably most, Web sites are like that.  But
then they should not have created "fr.wikipedia.org" and
"de.wikipedia.org" and such, with "fr" and "de" taking, as it were, the
place of "www".  They could have used "www.fr.wikipedia.org" or
"wikipedia.org/fr/" and avoided the possible ambiguity which they are
now facing.

Or, since they already make up codes when it suits them, they could use
"wawa.wikipedia.org" for the Wawa version and be done with it.

> [0] ...and not that it matters, but the language codes don't always
> match for historical reasons, from before we used ISO or from before
> ISO might have had codes for those languages.  We're actually planning
> on moving most of the wikis soon to match ISO, but it's a pretty
> labor-intensive process so it's not something that's done every day.

"We" implies that you are from Wikipedia, which I don't think was stated
before.

BTW, wikis that were created before ISO 639-3 existed could still have
followed the BCP 47 architecture and syntax.  It wasn't necessary to
invent "simple" and "roa-tara" and "zh-classical" when conformant
private-use tags could have been used instead.

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­




More information about the Ietf-languages mailing list