[Ltru] Re: "mis" update review request

Peter Constable petercon at microsoft.com
Sat Apr 14 01:06:04 CEST 2007


For the imperfect-knowledge scenario you describe where the process ends with an "I don't know" conclusion, und is the appropriate tag, not mis.


Peter

From: Mark Davis [mailto:mark.davis at icu-project.org]
Sent: Friday, April 13, 2007 1:05 PM
To: Randy Presuhn
Cc: ietf-languages at alvestrand.no; ltru at lists.ietf.org
Subject: Re: [Ltru] Re: "mis" update review request

I agree with you about the stability issue.

I think part of the problem in communicating about this is that people may have somewhat different usage scenarios in mind. If you think of tagging as something that a person does with content that they originate or have control over, then it is (probably) fairly straightforward for that person to tag as specifically as possible.

Another scenario is where you have incoming content, and you need to tag it for use by other components. This might be done, for example, in a search engine, where you fetch and process a page, and use that information later in doing searches. The tag serves to communicate language between the different components.

In that case, you have far from perfect information about the content: what you have being typically the result of some level statistical analysis, plus other factors about the document. You need to tag with as much information as you have, *but no more*. It is in that case where you need to have the tags that indicate some level of imperfect knowledge about the source, such as "I have no idea what this is", or "It looks like linguistic content, but I don't know which language", or "it doesn't look like linguistic content". (You may also have more detailed knowledge, like that some document appears to have 70% English content (probability 95%) and 20% French content (probability 65%)).

Both scenarios are equally valid use cases for BCP 47 (in fact, as a percentage of data flow on the web, I'd wager strongly that the second scenario completely swamps the first).

Mark
On 4/13/07, Randy Presuhn <randy_presuhn at mindspring.com<mailto:randy_presuhn at mindspring.com>> wrote:
Hi -

The thing that bothers me about the comment "A collection of languages which
don't belong to any other collection" is that it isn't compatible with the
possibility that one or more of those languages might eventually be included
in another (possibly new) collection.  If that langauage were left in the mis
collection as well, the comment would be incorrect.  If the language were
removed from the mis collection, stability goes out the window.  Either way,
changing the comment wouldn't help the situation.

However, unlike some other collections, I find it very difficult to imagine
a case where "mis" would be useful in tagging data.  Without a clear use case
for "mis" in constructing a language tag, perhaps we could conclude that this
whole debate is really academic, and that no action is needed.

Randy


_______________________________________________
Ltru mailing list
Ltru at ietf.org<mailto:Ltru at ietf.org>
https://www1.ietf.org/mailman/listinfo/ltru



--
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20070413/4d8fa4d8/attachment.html


More information about the Ietf-languages mailing list