Fwd: draft-davis-t-langtag-ext

Pete Resnick presnick at qualcomm.com
Thu Jul 7 01:49:07 CEST 2011

Most of the people on the ietf-languages list are probably on the 
ltru at ietf.org list as well, but I wanted to confirm that everyone got a 
chance to review this before it proceeded to the IESG. Please have a 
look at the ltru archive 
<http://www.ietf.org/mail-archive/web/ltru/current/maillist.html> and 
send any comments to the ltru at ietf.org list since that's where 
discussion seems to be taking place.



-------- Original Message --------
Subject: 	[Ltru] draft-davis-t-langtag-ext
Date: 	Wed, 22 Jun 2011 15:00:47 -0700
From: 	Mark Davis ? <mark at macchiato.com>
To: 	Martin J. Dürst <duerst at it.aoyama.ac.jp>
CC: 	LTRU Working Group <ltru at ietf.org>, <court at infiauto.com>

A new draft posted at 

Martin, we tried to address your concerns; please take a look and let us 
know what you think.

/--- Il meglio è l'inimico del bene ---/

On Tue, Jun 21, 2011 at 09:00, Mark Davis ? <mark at macchiato.com 
<mailto:mark at macchiato.com>> wrote:

    Those are good issues; thanks for raising them and starting the
    discussion. Comments below.

    /--- Il meglio è l'inimico del bene ---/

    On Mon, Jun 20, 2011 at 23:39, "Martin J. Dürst"
    <duerst at it.aoyama.ac.jp <mailto:duerst at it.aoyama.ac.jp>> wrote:

        Hello Mark, others,

        Overall comment:
        The idea to reuse language tags to indicate
        transliteration/transcription source, and to add some additional
        tags to distinguish methods seems to be reasonable and sound.

        The description of the structure of the allowed subtags and of
        the responsibility split between IETF (this draft) and UTC (UTS
        35) looks quite messy to me, and should be cleaned up. I'd
        personally prefer that UTS 35 (or whatever else on the Unicode
        side) only define the <mechanism> part (after the m0 subtag).

    That would be my preference as well (can't speak for my coauthors).

    We patterned it this way following what ended up being accepted for
      the -u- extension. That is, the spec is in UTS35, but there is a
    summary here. But of course, there are many ways to do it. And maybe
    this summary is too detailed, at least for the mechanism part, and
    we could just have it in UTS35.

    We considered a number of alternatives:

        * We could define everything after -t- to be the source
          language, and everything after -m- to be the mechanism. But
          that burns 2 extension letters, just one.
        * We also considered having everything in the -u extension, for
          which we already have the structure set up. However, that
          would force us to have artificial source subtags like 'en0'
          instead of 'en', because the -u- extension wouldn't allow the
          2-letter subtags (it already defines a use for them).
        * We could also have -t- be just the source, and define the
          mechanism in -u-, also easy. But we felt it would be better to
          have everything under one extension.

        Detailled comments:

        "In addition, it may also be important to
           specify a particular specification for the transformation.":
        Too much 'spec' in one sentence.


        "For example, if one is transcribing the names of Italian or Russian
           cities on a map for Japanese users, each name will need to be
           transliterated into katakana using rules appropriate for the
           language and target languages.": "source languages and target


        BCP47 required information: The first three paragraphs should
        move to the introduction.

    Other authors, what do you think?

        "followed by a sequence of subtags that would form a language
        tag": Here and in general: Don't use 'would'.

    Grammatically, it is that the sequence of subtags *would* form a
    language subtag if they *were* separated out. They are not actually
    a language tag, because they occur in the middle of another language
    subtag. How would you like that to be phrased?

           The structure of 't' subtags is determined by the Unicode CLDR
           Technical Committee, in accordance with the policies and
           in http://www.unicode.org/consortium/tc-procedures.html, and
           to the Unicode Consortium Policies on

        The following paragraph is also difficult to understand. I
        wouldn't know exactly what falls on what side. I think one major
        reason is that we are treading new ground here, it's the first
        time we have a singleton definition that allows reuse of
        language tags (with a few restrictions) as well as intends to
        define its own extensions.

    These were both patterned after what was used for the -u- extension.
    We can take a look at them to try to clarify.

           Changes that can be made by successive versions of LDML
        [UTS35] by
           the Unicode Consortium without requiring a new RFC include the
           allocation of new subtags for use after the 't' extension.  A
        new RFC
           would be required for material changes to an existing 't'
        subtag, or
           an incompatible change to the overall syntactic structure of
        the 't'
           extension; however, such a change would be contrary to the
           of the Unicode Consortium, and thus is not anticipated.

        2.1 Summary: There seems to be quite some overlap between the
        part of section 2 before the 2.1 heading.

        One question I would have as a linguistic researcher is: How
        much effort and time is involved in getting a 'mechanism'
        approved? If such 'mechanisms' are e.g. rejected with arguments
        like "if we accept it, then everybody has to implement it" or
        so, then I would see that as a problem.

    Good point. I'll propose some text.

        So much for the moment.

        Regards,   Martin.

        On 2011/06/18 6:07, Mark Davis ? wrote:

            Yoshito, Addison, and I had had an action for a while now
            from the CLDR
            committee to submit a draft for a an extension. Rather than
            go through all
            the problems in the falk draft, we put together an
            alternative approach,
            leveraging the work we already did for the -u- extension.

            It just got posted at

            Courtney, I think this provides a superset of the
            functionality that you are
            interested in. Perhaps you can read it over, and we can add
            you as an author
            of the next version of this draft instead of having the two


            *--- Il meglio è l'inimico del bene ---*

            On Wed, Jun 15, 2011 at 10:50, Randy Presuhn
            <randy_presuhn at mindspring.com
            <mailto:randy_presuhn at mindspring.com>>wrote:

                Hi -

                I started out with an off-list response, but I figure
                this is
                something worth sending to the list.

                Off-list, a contributor asked:


                    I'd love to see your input. I'd like to make sure I
                    all the concerns. Is there any way you could forward
                    this to the list?

                My response:

                Sorry, already deleted.  As I recall, the main concerns were

                  (1) there already *is* support for identifying
                      (remember German?)
                  (2) the I-D seems to assume that transliterations
                always result
                      in "Latin" (previous discussion on LTRU included
                      to Cyrillic and Hangul, among others)
                  (3) the "original orthography" is irrelevant for the
                      systems I've been able to think of.  (At the same
                time, some
                      transliteration systems are quite "lossy" and some
                don't do
                      "round trip" very well.)  Consider also the
                transliteration of
                      which was originally in audio form...
                  (4) The draft doesn't clearly distinguish
                "orthography" from
                      This may be because the boundary between the two
                can be fuzzy, but
                      that is an issue that should be addressed.
                  (5) How this fits in with *transcription* systems
                (e.g. IPA) should be
                      addressed.  The boundary gets fuzzy with
                orthographies that are
                      to phonemic representations of the language.
                  (e.g., Pinyin for
                  (6) The proposed singleton usage appears broken and

                Or something like that.  I may have forgotten something
                here, or, in the
                process of reconstruction, thought of something I missed
                the first time.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20110706/c55f36b1/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Attached Message Part
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20110706/c55f36b1/attachment.ksh>

More information about the Ietf-languages mailing list