[Ltru] Re: "mis" update review request

Peter Constable petercon at microsoft.com
Sat Apr 21 01:34:17 CEST 2007


From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis

>> I don't expect to see .cxx, .h, etc. files tagged with language tags any time soon
>
> Well, every file available on the web, like
> http://www.cs.duke.edu/csed/tapestry/win/date.h (chosen at random) gets
> some language tag when processed at Google (I can't say what MSN,
> Yahoo, and other search engines do). So right under your nose millions of
> pages of source code are getting tagged, all the time. We are faced with the
> practical problem of what the best thing to do is according to the standard.

Well, I suppose you have to decide: when an English user does a search on "date", what's the likelihood that the content in a file date.h is going to be of interest to them.


All I've said all along is that the only appropriate language tag in ISO 639 for programming languages (when you have determined the content is in a programming language) is "zxx". That refers to content such as "#ifndef _DATE_H", not content such as "// a class for manipulating dates". Deciding how Google should tag a file like date.h is an entirely different question. Absent any context, there is no right or wrong answer about how to tag *as a whole* a file like date.h: clearly there is about as much English content in that file as there is C (or whatever prog lang it is). In a given application context, such as how Google should tag that file as a whole, there may be good reasons to tag it one way or another. But that's a question to be worked out in the context of that application; IMO it is not a question that should be answered in BCP 47.

If you want to add text suggesting that different things may be done in different application scenarios, including tagging programming-code content with English comments or string literals as "en" when that provides the attribute most useful for implementation and user experience, I have no problem with that. But a *general* statement along lines such as that using an IETF language tag/ISO 639 ID "en" for such programming code is appropriate or that the appropriate tag for your example 9 would be "und" is, IMO, not something that should appear in BCP 47.


Peter


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20070420/7fde9600/attachment-0001.html


More information about the Ietf-languages mailing list