doug at ewellic.org
Fri Sep 11 04:23:28 CEST 2009
Michael Everson <everson at evertype dot com> wrote:
> There is no place for a "machine"; this is an authorship tag, not
> relevant to language identification.
That's exactly why John's proposal to make it an extension makes sense.
It conveys information that may be useful in a language tag, may be
relevant to the tagged content, but has nothing to do with language per
Mark Davis ? <mark at macchiato dot com> wrote:
> I think it is cleaner, simpler, and much more likely to be used if we
> just have an additional variant tag, like "mactrans".
and Peter Constable <petercon at microsoft dot com> wrote:
> But what are the use scenarios? If the key scenarios are simply
> providing an indicator to processes that might want to filter out MT
> content, then an extension with all its additional machinery is
> overkill; a single subtag "machxlat" is certainly sufficient.
I think our goal in deciding on a particular tagging mechanism should be
which mechanism fits best, not which is easiest for us to implement, or
guesses about one being more likely to be used by end users than the
other. For example, we added script subtags because we thought they
would be more appropriate for representing scripts than region subtags
(cf. "zh-TW" vs. "zh-CN"). We didn't scuttle the idea because users
might have a hard time using script subtags, or because they would be
overkill since most languages are not written in multiple scripts.
I thought Debbie's colleague(s) expressed the use case rather clearly.
They want to be able to distinguish between Welsh written by a human and
Welsh generated by a machine, and filter out the latter for one reason
or another. "Written by a human" and "generated by a machine" are not
attributes of Welsh per se, and not even really attributes of the text,
which is what I would expect a variant to represent. They are
Section 2.2.6, "Extension Subtags," says:
"Extensions provide a mechanism for extending language tags for use in
various applications. They are intended to identify information that is
commonly used in association with languages or language tags but that is
not part of language identification."
If this use case doesn't fit that description, then the goal is either
to not encode "machine-translated" at all, or to make sure the extension
mechanism is not used for anything, ever.
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
More information about the Ietf-languages