Peter Constable petercon at microsoft.com
Sat Jul 9 09:31:06 CEST 2011

I think this needs more thought. 

On the one hand, a transcription or transliteration conceptually can be considered just an orthographic convention for writing a language (albeit with characteristics or subject to rules not generally applicable to orthographies in the narrower sense), and the currently-available mechanisms are adequate for capturing orthographic distinctions.

On the other hand, this proposal would allow for reference to a source that is distinct from the primary language subtag, but only in relation to written form. However, in speech applications, the equivalent may also be necessary; for example, one might create a speech recognizer that is tailored specifically for 2nd-language speakers who are 1st-language speakers of a particular language (e.g. a recognizer for "English spoken with a Chinese accent"). Because this proposal makes reference to transcription and transliteration, it would appear not to allow for such scenarios, even though the requirements might be very similar. I think consideration should be given to whether a single solution that encompasses speech as well as written scenarios would be more appropriate.


-----Original Message-----
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of John Cowan
Sent: Thursday, July 07, 2011 9:37 AM
To: Michael Everson
Cc: ietf-languages
Subject: Re: draft-davis-t-langtag-ext

Michael Everson scripsit:

> Can you summarize what this is about, Pete?

In a word, the proposal is for a -t- subtag which will allow one to state the source language and script in the case of transcription and transliteration.  Thus:

   | Language Tag        | Description                                 |
   | ja-t-it             | The content is Japanese, transformed from   |
   |                     | Italian.                                    |
   | ja-Kana-t-it        | The content is Japanese Katakana,           |
   |                     | transformed from Italian.                   |
   | und-Latn-t-und-cyrl | The content is in the Latin script,         |
   |                     | language undetermined,
   |                     | transformed from the Cyrillic script,       |
   |                     | language undetermined.

What follows the -t- is itself a valid language tag with no embedded single-letter tags nor private-use tags.

When I'm stuck in something boring              John Cowan
where reading would be impossible or            (who loves Asimov too)
rude, I often set up math problems for          cowan at ccil.org
myself and solve them as a way to pass          http://www.ccil.org/~cowan
the time.      --John Jenkins
Ietf-languages mailing list
Ietf-languages at alvestrand.no

More information about the Ietf-languages mailing list