Machine Translation

Felix Sasaki felix.sasaki at fh-potsdam.de
Sat Sep 12 13:07:44 CEST 2009


2009/9/12 Kent Karlsson <kent.karlsson14 at comhem.se>

>  Not that I'm necessarily arguing that a language tag extension should be
> defined for this (in particular I'm not arguing for using BLEU scores), but
> I'd just like to clarify a few things.
>
>
> Den 2009-09-12 09.06, skrev "Felix Sasaki" <felix.sasaki at fh-potsdam.de>:
>
> Of course one could define several machine translation related extensions,
>
> I think so far, just one extension related to this has been alluded to,
> varyingly using "-t-" or "-m-" in examples as singletons for it (in messages
> earlier in this thread).
>
> but at some point the suitability of language tags is really in question.
> E.g. how would you represent a bleu score as an extension? The point is that
> proper metadata for evaluation of machine translation soon goes beyond a
> simple "can be represented as a closed set of strings" pattern.
>
>
> The RFC describing the extension can allow for numbers to be expressed in
> the extension, as long as the number is expressed as strings (could be
> several per number) of ASCII letters and digits of length 2-8. The RFC need
> not enumerate them. In particular, subtags that occur in an extension have
> no relation to subtags that occur in the LSR.
>


I am aware of that. I was not aware that there is no need to have a closed
set of strings for extensions, thanks for the clarification.



>
> However, I'm not arguing for actually doing this.
>


I am relieved :)

Felix



> So far there is no consensus that an extension for "translation status" or
> similar should be defined.
>
>         /kent k
>
>
> Felix
>
> 2009/9/11 Kent Karlsson <kent.karlsson14 at comhem.se>
>
> What you say is correct for a (single) variant subtag, as initially
> suggested, but extension subtags
> work differently. See http://tools.ietf.org/search/rfc5646#section-2.2.6.
> Data like that you refer
> to can be put in the part that follows the extention "singleton".
>
> Note also that section 2.2.6 starts:
>
> "Extensions provide a mechanism for extending language tags for use in
>    various applications.  They are intended to identify information that
>    is commonly used in association with languages or language tags but
>    that is not part of language identification.
> "
>
>         /kent k
>
>
>
> Den 2009-09-11 18.35, skrev "Felix Sasaki" <felix.sasaki at fh-potsdam.de <
> http://felix.sasaki@fh-potsdam.de> >:
>
> I would agree with Yves Savourel that for translation tool developers, this
> kind of information is better provided via a different field. Other
> practical information which one could not pack into a broad data category
> "machine translation" easily (to use Peter's terminology), but not easily in
> the "language tag" field would be: name of system that generated the
> translation (maybe several ones where used ...), quality of the input,
> quality rating of the system (e.g. BLEU score). IMO these fine grained
> differences are necessary for making use of this kind of metadata, and I
> don't see a clear use case for a broad "machine translated" sub tag.
>
> Felix
>
> 2009/9/11 Kent Karlsson <kent.karlsson14 at comhem.se <
> http://kent.karlsson14@comhem.se> >
>
>
> Den 2009-09-11 17.32, skrev "Peter Constable" <petercon at microsoft.com <
> http://petercon@microsoft.com> >:
>
> > From: ietf-languages-bounces at alvestrand.no <
> http://ietf-languages-bounces@alvestrand.no>
> > [mailto:ietf-languages-bounces at alvestrand.no<ietf-languages-bounces at alvestrand.no>]
> On Behalf Of Felix Sasaki
> >
> >> There is a difference in the case of XLIFF. If the extension subtag is
> just
> >> similar,
> >> but not identical to MT related information in other technologies like
> XLIFF,
> >> you
> >> will end up with a mess of *values*. This is IMO different from the
> script
> >> subtag
> >> case: Here you have the same values, but different *occurences*
> >
> > Expressed with different terminology: you end up with a mess of data
> > categories; in the script subtag case, you have a single data category
> with
> > many values.
>
> I don't think that should be a major issue. XLIFF, and other formats having
> separate attributes for this, could simply have that attribute take
> priority, even to the extent that "language extensions", in particular one
> that overlaps with an attribute, can be completely ignored in those
> formats.
>
>         /kent k
>
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no <http://Ietf-languages@alvestrand.no>
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20090912/e4298909/attachment-0001.htm 


More information about the Ietf-languages mailing list