[Ltru] Re: "mis" update review request

Fri Apr 20 22:44:06 CEST 2007

I say your programming code example is a boundary case in the sense that I don’t expect to see .cxx, .h, etc. files tagged with language tags any time soon, and I don’t expect to see a book on programming concepts tagged as anything other than en, no matter how many pages of source code samples it has.

(Granted, in an XML representation of that book there may be a question as to how individual elements should be tagged, but it’s not clear to me in that scenario what difference it really makes whether you have <code sample xml:lang=”en”> or <code sample xml:lang=”zxx”> or <code sample xml:lang=”und”> or <code sample xml:lang=””> or whatever.)

Peter

From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis
Sent: Friday, April 20, 2007 8:59 AM
To: Peter Constable
Cc: ietf-languages at alvestrand.no; ltru at lists.ietf.org
Subject: Re: [Ltru] Re: "mis" update review request

I don't think the programming language fragment is really a boundary condition. Most code source nowadays are not just random hex, there typically, not exceptionally, some real linguistic content. I would agree with you that a hex dump of a compiled program, such as perhaps you used for your example, is sensible to tag as zxx, but based on the wording of the standards, I don't think we can expect zxx to apply to typical code source. Yet, while there may be is some embedded English, we don't want to call it "en" either.

It looks to me like the best choice currently would be "und"; as I said, I think it might be useful to have a special tag for this just because it is a reasonably common case that is otherwise difficult to categorize. An alternative would be to explicitly broaden the description of "zxx" to be "no linguistic content, or programming source code". That would be a compatible change to 4646bis, since it is a broadening.

Mark
On 4/20/07, Peter Constable <petercon at microsoft.com<mailto:petercon at microsoft.com>> wrote:

From: Mark Davis [mailto:mark.davis at icu-project.org<mailto:mark.davis at icu-project.org>]

> As in example #9 of http://docs.google.com/Doc?id=dfqr8rd5_11g425c9 ,

> to think that the following contains "no linguistic content" is bizarre.

> It obviously contains linguistic content.

if (linguisticContent == null) { throw new Exception(""); }

You could say the same of this:

MZ

________________________________

________________________________
   ÿÿ  ¸       @                                   à
º
 ´            Í!¸LÍ!This program cannot be run in DOS mode.

$       Tbï›

________________________________
È
________________________________
È
________________________________
È7ÅïÈ
________________________________
È7ÅüÈ
________________________________
È7ÅúÈ
________________________________
È
________________________________
€ÈÉ
________________________________
È7ÅìÈ3
________________________________
È7ÅýÈ
________________________________
È7ÅùÈ
________________________________
ÈRich
________________________________
È

We could probably come up with all kinds of boundary cases for which there is no "right" answer. I don't know what use it would be.

Peter

_______________________________________________
Ltru mailing list
Ltru at ietf.org<mailto:Ltru at ietf.org>
https://www1.ietf.org/mailman/listinfo/ltru

--
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20070420/a37d4e43/attachment-0001.html