[Ltru] draft-davis-t-langtag-ext

Sat Jul 9 09:31:43 CEST 2011

I am, but that has been inactive for so long I wasn’t paying attention to it.

Peter

From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?
Sent: Thursday, July 07, 2011 12:58 PM
To: Peter Constable
Cc: ltru at ietf.org; ietf-languages at alvestrand.no
Subject: Re: [Ltru] draft-davis-t-langtag-ext

It's still in discussion, so feedback is welcome. (I'd thought you were on LTRU...)

Mark
— Il meglio è l’inimico del bene —

On Thu, Jul 7, 2011 at 08:29, Peter Constable <petercon at microsoft.com<mailto:petercon at microsoft.com>> wrote:
I was not aware of the discussion on LTRU. When will it be reviewed by IESG? What is the action being requested of IESG / what’s the status of this draft?

Peter

From: ietf-languages-bounces at alvestrand.no<mailto:ietf-languages-bounces at alvestrand.no> [mailto:ietf-languages-bounces at alvestrand.no<mailto:ietf-languages-bounces at alvestrand.no>] On Behalf Of Pete Resnick
Sent: Wednesday, July 06, 2011 4:49 PM
To: ietf-languages at alvestrand.no<mailto:ietf-languages at alvestrand.no>
Subject: Fwd: draft-davis-t-langtag-ext

Most of the people on the ietf-languages list are probably on the ltru at ietf.org<mailto:ltru at ietf.org> list as well, but I wanted to confirm that everyone got a chance to review this before it proceeded to the IESG. Please have a look at the ltru archive <http://www.ietf.org/mail-archive/web/ltru/current/maillist.html><http://www.ietf.org/mail-archive/web/ltru/current/maillist.html> and send any comments to the ltru at ietf.org<mailto:ltru at ietf.org> list since that's where discussion seems to be taking place.

Thanks.

pr

-------- Original Message --------
Subject:

[Ltru] draft-davis-t-langtag-ext

Date:

Wed, 22 Jun 2011 15:00:47 -0700

From:

Mark Davis ☕ <mark at macchiato.com><mailto:mark at macchiato.com>

To:

Martin J. Dürst <duerst at it.aoyama.ac.jp><mailto:duerst at it.aoyama.ac.jp>

CC:

LTRU Working Group <ltru at ietf.org><mailto:ltru at ietf.org>, <court at infiauto.com><mailto:court at infiauto.com>

A new draft posted at http://tools.ietf.org/html/draft-davis-t-langtag-ext-01

Martin, we tried to address your concerns; please take a look and let us know what you think.

Mark
— Il meglio è l’inimico del bene —
On Tue, Jun 21, 2011 at 09:00, Mark Davis ☕ <mark at macchiato.com<mailto:mark at macchiato.com>> wrote:
Those are good issues; thanks for raising them and starting the discussion. Comments below.

________________________________
Mark
— Il meglio è l’inimico del bene —

On Mon, Jun 20, 2011 at 23:39, "Martin J. Dürst" <duerst at it.aoyama.ac.jp<mailto:duerst at it.aoyama.ac.jp>> wrote:
Hello Mark, others,

Overall comment:
The idea to reuse language tags to indicate transliteration/transcription source, and to add some additional tags to distinguish methods seems to be reasonable and sound.

The description of the structure of the allowed subtags and of the responsibility split between IETF (this draft) and UTC (UTS 35) looks quite messy to me, and should be cleaned up. I'd personally prefer that UTS 35 (or whatever else on the Unicode side) only define the <mechanism> part (after the m0 subtag).

That would be my preference as well (can't speak for my coauthors).

We patterned it this way following what ended up being accepted for  the -u- extension. That is, the spec is in UTS35, but there is a summary here. But of course, there are many ways to do it. And maybe this summary is too detailed, at least for the mechanism part, and we could just have it in UTS35.

We considered a number of alternatives:

  *   We could define everything after -t- to be the source language, and everything after -m- to be the mechanism. But that burns 2 extension letters, just one.
  *   We also considered having everything in the -u extension, for which we already have the structure set up. However, that would force us to have artificial source subtags like 'en0' instead of 'en', because the -u- extension wouldn't allow the 2-letter subtags (it already defines a use for them).
  *   We could also have -t- be just the source, and define the mechanism in -u-, also easy. But we felt it would be better to have everything under one extension.

Detailled comments:

"In addition, it may also be important to
  specify a particular specification for the transformation.": Too much 'spec' in one sentence.

ok

"For example, if one is transcribing the names of Italian or Russian
  cities on a map for Japanese users, each name will need to be
  transliterated into katakana using rules appropriate for the source
  language and target languages.": "source languages and target language"?

yes

BCP47 required information: The first three paragraphs should move to the introduction.

Other authors, what do you think?

"followed by a sequence of subtags that would form a language tag": Here and in general: Don't use 'would'.

Grammatically, it is that the sequence of subtags *would* form a language subtag if they *were* separated out. They are not actually a language tag, because they occur in the middle of another language subtag. How would you like that to be phrased?

>>>>
  The structure of 't' subtags is determined by the Unicode CLDR
  Technical Committee, in accordance with the policies and procedures
  in http://www.unicode.org/consortium/tc-procedures.html, and subject
  to the Unicode Consortium Policies on
  http://www.unicode.org/policies/policies.html.
>>>>

The following paragraph is also difficult to understand. I wouldn't know exactly what falls on what side. I think one major reason is that we are treading new ground here, it's the first time we have a singleton definition that allows reuse of language tags (with a few restrictions) as well as intends to define its own extensions.

These were both patterned after what was used for the -u- extension. We can take a look at them to try to clarify.

>>>>
  Changes that can be made by successive versions of LDML [UTS35] by
  the Unicode Consortium without requiring a new RFC include the
  allocation of new subtags for use after the 't' extension.  A new RFC
  would be required for material changes to an existing 't' subtag, or
  an incompatible change to the overall syntactic structure of the 't'
  extension; however, such a change would be contrary to the policies
  of the Unicode Consortium, and thus is not anticipated.
>>>>

2.1 Summary: There seems to be quite some overlap between the part of section 2 before the 2.1 heading.

One question I would have as a linguistic researcher is: How much effort and time is involved in getting a 'mechanism' approved? If such 'mechanisms' are e.g. rejected with arguments like "if we accept it, then everybody has to implement it" or so, then I would see that as a problem.

Good point. I'll propose some text.

So much for the moment.

Regards,   Martin.

On 2011/06/18 6:07, Mark Davis ☕ wrote:
Yoshito, Addison, and I had had an action for a while now from the CLDR
committee to submit a draft for a an extension. Rather than go through all
the problems in the falk draft, we put together an alternative approach,
leveraging the work we already did for the -u- extension.

It just got posted at
http://tools.ietf.org/html/draft-davis-t-langtag-ext-00

Courtney, I think this provides a superset of the functionality that you are
interested in. Perhaps you can read it over, and we can add you as an author
of the next version of this draft instead of having the two competing
proposals.

Mark

*— Il meglio è l’inimico del bene —*

On Wed, Jun 15, 2011 at 10:50, Randy Presuhn
<randy_presuhn at mindspring.com<mailto:randy_presuhn at mindspring.com>>wrote:
Hi -

I started out with an off-list response, but I figure this is
something worth sending to the list.

Off-list, a contributor asked:

...
I'd love to see your input. I'd like to make sure I understand
all the concerns. Is there any way you could forward this to the list?

My response:

Sorry, already deleted.  As I recall, the main concerns were

 (1) there already *is* support for identifying orthographies
     (remember German?)
 (2) the I-D seems to assume that transliterations always result
     in "Latin" (previous discussion on LTRU included transliterations
     to Cyrillic and Hangul, among others)
 (3) the "original orthography" is irrelevant for the transliteration
     systems I've been able to think of.  (At the same time, some
     transliteration systems are quite "lossy" and some don't do
     "round trip" very well.)  Consider also the transliteration of
material
     which was originally in audio form...
 (4) The draft doesn't clearly distinguish "orthography" from
"transliteration".
     This may be because the boundary between the two can be fuzzy, but
even
     that is an issue that should be addressed.
 (5) How this fits in with *transcription* systems (e.g. IPA) should be
     addressed.  The boundary gets fuzzy with orthographies that are
equivalent
     to phonemic representations of the language.  (e.g., Pinyin for
Mandarin)
 (6) The proposed singleton usage appears broken and unnecessary.

Or something like that.  I may have forgotten something here, or, in the
process of reconstruction, thought of something I missed the first time.

Randy

_______________________________________________
Ltru mailing list
Ltru at ietf.org<mailto:Ltru at ietf.org>
https://www.ietf.org/mailman/listinfo/ltru

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20110709/e593e7a1/attachment-0001.html>