Fwd: draft-davis-t-langtag-ext
Pete Resnick
presnick at qualcomm.com
Thu Jul 7 01:49:07 CEST 2011
Most of the people on the ietf-languages list are probably on the
ltru at ietf.org list as well, but I wanted to confirm that everyone got a
chance to review this before it proceeded to the IESG. Please have a
look at the ltru archive
<http://www.ietf.org/mail-archive/web/ltru/current/maillist.html> and
send any comments to the ltru at ietf.org list since that's where
discussion seems to be taking place.
Thanks.
pr
-------- Original Message --------
Subject: [Ltru] draft-davis-t-langtag-ext
Date: Wed, 22 Jun 2011 15:00:47 -0700
From: Mark Davis ? <mark at macchiato.com>
To: Martin J. Dürst <duerst at it.aoyama.ac.jp>
CC: LTRU Working Group <ltru at ietf.org>, <court at infiauto.com>
A new draft posted at
http://tools.ietf.org/html/draft-davis-t-langtag-ext-01
Martin, we tried to address your concerns; please take a look and let us
know what you think.
Mark
/--- Il meglio è l'inimico del bene ---/
On Tue, Jun 21, 2011 at 09:00, Mark Davis ? <mark at macchiato.com
<mailto:mark at macchiato.com>> wrote:
Those are good issues; thanks for raising them and starting the
discussion. Comments below.
------------------------------------------------------------------------
Mark
/--- Il meglio è l'inimico del bene ---/
On Mon, Jun 20, 2011 at 23:39, "Martin J. Dürst"
<duerst at it.aoyama.ac.jp <mailto:duerst at it.aoyama.ac.jp>> wrote:
Hello Mark, others,
Overall comment:
The idea to reuse language tags to indicate
transliteration/transcription source, and to add some additional
tags to distinguish methods seems to be reasonable and sound.
The description of the structure of the allowed subtags and of
the responsibility split between IETF (this draft) and UTC (UTS
35) looks quite messy to me, and should be cleaned up. I'd
personally prefer that UTS 35 (or whatever else on the Unicode
side) only define the <mechanism> part (after the m0 subtag).
That would be my preference as well (can't speak for my coauthors).
We patterned it this way following what ended up being accepted for
the -u- extension. That is, the spec is in UTS35, but there is a
summary here. But of course, there are many ways to do it. And maybe
this summary is too detailed, at least for the mechanism part, and
we could just have it in UTS35.
We considered a number of alternatives:
* We could define everything after -t- to be the source
language, and everything after -m- to be the mechanism. But
that burns 2 extension letters, just one.
* We also considered having everything in the -u extension, for
which we already have the structure set up. However, that
would force us to have artificial source subtags like 'en0'
instead of 'en', because the -u- extension wouldn't allow the
2-letter subtags (it already defines a use for them).
* We could also have -t- be just the source, and define the
mechanism in -u-, also easy. But we felt it would be better to
have everything under one extension.
Detailled comments:
"In addition, it may also be important to
specify a particular specification for the transformation.":
Too much 'spec' in one sentence.
ok
"For example, if one is transcribing the names of Italian or Russian
cities on a map for Japanese users, each name will need to be
transliterated into katakana using rules appropriate for the
source
language and target languages.": "source languages and target
language"?
yes
BCP47 required information: The first three paragraphs should
move to the introduction.
Other authors, what do you think?
"followed by a sequence of subtags that would form a language
tag": Here and in general: Don't use 'would'.
Grammatically, it is that the sequence of subtags *would* form a
language subtag if they *were* separated out. They are not actually
a language tag, because they occur in the middle of another language
subtag. How would you like that to be phrased?
>>>>
The structure of 't' subtags is determined by the Unicode CLDR
Technical Committee, in accordance with the policies and
procedures
in http://www.unicode.org/consortium/tc-procedures.html, and
subject
to the Unicode Consortium Policies on
http://www.unicode.org/policies/policies.html.
>>>>
The following paragraph is also difficult to understand. I
wouldn't know exactly what falls on what side. I think one major
reason is that we are treading new ground here, it's the first
time we have a singleton definition that allows reuse of
language tags (with a few restrictions) as well as intends to
define its own extensions.
These were both patterned after what was used for the -u- extension.
We can take a look at them to try to clarify.
>>>>
Changes that can be made by successive versions of LDML
[UTS35] by
the Unicode Consortium without requiring a new RFC include the
allocation of new subtags for use after the 't' extension. A
new RFC
would be required for material changes to an existing 't'
subtag, or
an incompatible change to the overall syntactic structure of
the 't'
extension; however, such a change would be contrary to the
policies
of the Unicode Consortium, and thus is not anticipated.
>>>>
2.1 Summary: There seems to be quite some overlap between the
part of section 2 before the 2.1 heading.
One question I would have as a linguistic researcher is: How
much effort and time is involved in getting a 'mechanism'
approved? If such 'mechanisms' are e.g. rejected with arguments
like "if we accept it, then everybody has to implement it" or
so, then I would see that as a problem.
Good point. I'll propose some text.
So much for the moment.
Regards, Martin.
On 2011/06/18 6:07, Mark Davis ? wrote:
Yoshito, Addison, and I had had an action for a while now
from the CLDR
committee to submit a draft for a an extension. Rather than
go through all
the problems in the falk draft, we put together an
alternative approach,
leveraging the work we already did for the -u- extension.
It just got posted at
http://tools.ietf.org/html/draft-davis-t-langtag-ext-00
Courtney, I think this provides a superset of the
functionality that you are
interested in. Perhaps you can read it over, and we can add
you as an author
of the next version of this draft instead of having the two
competing
proposals.
Mark
*--- Il meglio è l'inimico del bene ---*
On Wed, Jun 15, 2011 at 10:50, Randy Presuhn
<randy_presuhn at mindspring.com
<mailto:randy_presuhn at mindspring.com>>wrote:
Hi -
I started out with an off-list response, but I figure
this is
something worth sending to the list.
Off-list, a contributor asked:
...
I'd love to see your input. I'd like to make sure I
understand
all the concerns. Is there any way you could forward
this to the list?
My response:
Sorry, already deleted. As I recall, the main concerns were
(1) there already *is* support for identifying
orthographies
(remember German?)
(2) the I-D seems to assume that transliterations
always result
in "Latin" (previous discussion on LTRU included
transliterations
to Cyrillic and Hangul, among others)
(3) the "original orthography" is irrelevant for the
transliteration
systems I've been able to think of. (At the same
time, some
transliteration systems are quite "lossy" and some
don't do
"round trip" very well.) Consider also the
transliteration of
material
which was originally in audio form...
(4) The draft doesn't clearly distinguish
"orthography" from
"transliteration".
This may be because the boundary between the two
can be fuzzy, but
even
that is an issue that should be addressed.
(5) How this fits in with *transcription* systems
(e.g. IPA) should be
addressed. The boundary gets fuzzy with
orthographies that are
equivalent
to phonemic representations of the language.
(e.g., Pinyin for
Mandarin)
(6) The proposed singleton usage appears broken and
unnecessary.
Or something like that. I may have forgotten something
here, or, in the
process of reconstruction, thought of something I missed
the first time.
Randy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20110706/c55f36b1/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Attached Message Part
URL: <http://www.alvestrand.no/pipermail/ietf-languages/attachments/20110706/c55f36b1/attachment.ksh>
More information about the Ietf-languages
mailing list