[Ltru] Re: "mis" update review request

Mark Davis mark.davis at icu-project.org
Fri Apr 13 22:04:48 CEST 2007


I agree with you about the stability issue.

I think part of the problem in communicating about this is that people may
have somewhat different usage scenarios in mind. If you think of tagging as
something that a person does with content that they originate or have
control over, then it is (probably) fairly straightforward for that person
to tag as specifically as possible.

Another scenario is where you have incoming content, and you need to tag it
for use by other components. This might be done, for example, in a search
engine, where you fetch and process a page, and use that information later
in doing searches. The tag serves to communicate language between the
different components.

In that case, you have far from perfect information about the content: what
you have being typically the result of some level statistical analysis, plus
other factors about the document. You need to tag with as much information
as you have, *but no more*. It is in that case where you need to have the
tags that indicate some level of imperfect knowledge about the source, such
as "I have no idea what this is", or "It looks like linguistic content, but
I don't know which language", or "it doesn't look like linguistic content".
(You may also have more detailed knowledge, like that some document appears
to have 70% English content (probability 95%) and 20% French content
(probability 65%)).

Both scenarios are equally valid use cases for BCP 47 (in fact, as a
percentage of data flow on the web, I'd wager strongly that the second
scenario completely swamps the first).

Mark

On 4/13/07, Randy Presuhn <randy_presuhn at mindspring.com> wrote:
>
> Hi -
>
> The thing that bothers me about the comment "A collection of languages
> which
> don't belong to any other collection" is that it isn't compatible with the
> possibility that one or more of those languages might eventually be
> included
> in another (possibly new) collection.  If that langauage were left in the
> mis
> collection as well, the comment would be incorrect.  If the language were
> removed from the mis collection, stability goes out the window.  Either
> way,
> changing the comment wouldn't help the situation.
>
> However, unlike some other collections, I find it very difficult to
> imagine
> a case where "mis" would be useful in tagging data.  Without a clear use
> case
> for "mis" in constructing a language tag, perhaps we could conclude that
> this
> whole debate is really academic, and that no action is needed.
>
> Randy
>
>
> _______________________________________________
> Ltru mailing list
> Ltru at ietf.org
> https://www1.ietf.org/mailman/listinfo/ltru
>



-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20070413/ca12f062/attachment-0001.html


More information about the Ietf-languages mailing list