Ltru Digest, Vol 44, Issue 15

Fri Oct 3 09:39:01 CEST 2008

Dear All,

0-Let me suggest that we should adopt a precise, uniform and recognized terminology when discussing about transformations between languages and/or scripts.
So, I reproduce thereafter 8 definitions that have been given by the UNGEGN (United Nations Group of Experts for Geographical Names) inside their Manual M 85, published by the UN Department of Economic and Social Affairs, Statistical Division, in May 2002, entitled "Glossary of Terms for the Standardization of Geographic Names".
All definitions are given in the 6 official UN's languages, and I only reproduce the english linguistic version.

1-TRANSFORMATION [, names]: [In toponymy,] general term covering the TRANSLATION, TRANSCRIPTION and TRANSLITERATION [of toponyms]. The two latter terms constitute CONVERSION.

2-TRANSLATION: (a) The process of expressing meaning, presented in a source LANGUAGE, in the words of a target LANGUAGE.
                           (b) A result of this process. [In toponymy it is sometimes applied only to the generic element of a name.]

3-CONVERSION: [In toponymy,] the process of transferring the phonological and/or morphological elements of a particular LANGUAGE to another, or from one SCRIPT to another. Conversion is effected by either TRANSCRIPTION or TRANSLITERATION.

4-TRANSCRIPTION: (a) A method of phonetic names CONVERSION between different LANGUAGES, in which the sounds of a source LANGUAGE are recorded in terms of a specific target LANGUAGE and its particular SCRIPT, normally withou recourse to additional diacritics.
                              (b) A result of this process.
 TRANSCRIPTION is not normally a reversible process. Retranscription (e.g. by computer) might result in a form differing from the original.
 However, pinyin romanization of Chinese, although being a CONVERSION between SCRIPTS, but phonetic and non-reversible, is also regarded as TRANSCRIPTION, not as TRANSLITERATION.

5-TRANSLITERATION: (a) A method of names CONVERSION between different alphabetic SCRIPTS or syllabic SCRIPTS, in which each character or di-, tri- or tetragraph of the source SCRIPT is represented in the target SCRIPT in principle by one character or di-, tri- or tetragraph, or a diacritic or a combination of these. TRANSLITERATION, as distinct from TRANSCRIPTION, aims at (but does not necessarily achieve) complete reversibility, and must be accompanied by a transliteration key.
                                  (b) A result of this process.

6-ROMANIZATION: CONVERSION from non-Roman into Roman SCRIPT.

7-LANGUAGE:  [In the context of this glossary,] a means of verbal communication used by a large community, including the words, their pronunciation and the method of combining them.

8-SCRIPT: A set of graphic symbols employed in writig or printing a particular LANGUAGE, differing from another set not only by typeface or font.
Groups of different SCRIPTS form writing systems.
Examples: Roman, Greek, Cyrillic, Korean, Thai, arabic and Hebrew SCRIPTS belong to the alphabetic writing system (but the latter two are defective, i.e. mainly consonant SCRIPTS); Amharic, Japanese Kana and Inuktitut (Eskimo) SCRIPTS belong to the syllabic writing system; Chinese Han and Japanese Kanji SCRIPTS belong to the logographic writing system.

Bien cordialement
.Gérard LANG

-----Message d'origine-----
De : ltru-bounces at ietf.org [mailto:ltru-bounces at ietf.org] De la part de ltru-request at ietf.org
Envoyé : vendredi 3 octobre 2008 08:02
À : ltru at ietf.org
Objet : Ltru Digest, Vol 44, Issue 15

Send Ltru mailing list submissions to
	ltru at ietf.org

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.ietf.org/mailman/listinfo/ltru
or, via email, send a message with subject or body 'help' to
	ltru-request at ietf.org

You can reach the person managing the list at
	ltru-owner at ietf.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of Ltru digest..."

Today's Topics:

   1. Re: Too fine-grained? (Leif Halvard Silli)
   2. Re: Uniqueness of variant subtags (John Cowan)
   3. Re: Uniqueness of variant subtags (Peter Constable)
   4. Re: Uniqueness of variant subtags (Mark Davis)
   5. Re: Uniqueness of variant subtags (Leif Halvard Silli)
   6. Re: Uniqueness of variant subtags (Peter Constable)

----------------------------------------------------------------------

Message: 1
Date: Fri, 03 Oct 2008 06:49:09 +0200
From: Leif Halvard Silli <lhs at malform.no>
Subject: Re: [Ltru] Too fine-grained?
To: Doug Ewell <doug at ewellic.org>
Cc: LTRU Working Group <ltru at ietf.org>
Message-ID: <48E5A445.6010003 at malform.no>
Content-Type: text/plain; charset=UTF-8; format=flowed

Doug Ewell 2008-10-03 04.31:
   [...]

> The finer-grained our language tagging solutions become, such as 
> "no-1907" versus "no-1917", the less likely it is that all of these 
> use cases will be met.

The disadvantages are also the advantages: You don't want "nn-1907" to be spelling checked like current "nn", so it may be fine if the computer does not try to spell-check "nn-1917". 
(Perhaps the alternative would be to "un-tag", or use an incorrect tag for texts following the 1917 norm, in order to avoid such
spell-checking.)

OTOH, a variant can sometimes come in place for a new language subtag - and thus can keep things together rather than be splitting. E.g. I understand that some prefer Resian to be registered as its own language. So this hints that variant subtags can also simplify and join rather than complicate and split.

   [...]

> So before we assume that we'll eventually have lots of 1917's and 
> 1959's and 1994's colliding with each other, and need to make sure all 
> of them can coexist, we might want to think about whether these are 
> distinctions that need to be made in language tags.

The alterantive will often be x-private subtags, I think. And they are even harder to get computers or people to handle.
--
leif halvard silli

------------------------------

Message: 2
Date: Fri, 3 Oct 2008 01:33:06 -0400
From: John Cowan <cowan at ccil.org>
Subject: Re: [Ltru] Uniqueness of variant subtags
To: Leif Halvard Silli <lhs at malform.no>
Cc: LTRU Working Group <ltru at ietf.org>,	Kent Karlsson
	<kent.karlsson14 at comhem.se>
Message-ID: <20081003053306.GM31839 at mercury.ccil.org>
Content-Type: text/plain; charset=us-ascii

Leif Halvard Silli scripsit:

> It is interesting to compare the prefixes allowed for "1996",
> 
> 	Prefix: de
> 
> to those allowed for "1994":
> 
> 	Prefix: sl-rozaj
> 	Prefix: sl-rozaj-biske
> 	Prefix: sl-rozaj-njiva
> 	Prefix: sl-rozaj-osojs
> 	Prefix: sl-rozaj-solba

Not *allowed*, but *recommended*.  The first means that we say you SHOULD use 'de'
(and implicitly, SHOULD NOT use any other primary language subtag) with '1996'.
The second group means that you SHOULD use 'sl', AND you SHOULD use 'rozaj', AD you should use either one of the four variant tags or no variant tag, before you use '1994'.

> Allthough "sl-rozaj-lipaw" is not listed, since "sl-rozaj" is listed, 
> how can we know that "sl-rozaj-lipaw-1994" is not permitted?

But it *is* permitted.  The Prefix: on a subtag is only what's recommended.

-- 
John Cowan   cowan at ccil.org  http://www.ccil.org/~cowan
Most languages are dramatically underdescribed, and at least one is dramatically overdescribed.  Still other languages are simultaneously overdescribed and underdescribed.  Welsh pertains to the third category.
        --Alan King

------------------------------

Message: 3
Date: Thu, 2 Oct 2008 22:53:27 -0700
From: Peter Constable <petercon at microsoft.com>
Subject: Re: [Ltru] Uniqueness of variant subtags
To: LTRU Working Group <ltru at ietf.org>
Message-ID:
	<DDB6DE6E9D27DD478AE6D1BBBB835795633D71B9AC at NA-EXMSG-C117.redmond.corp.microsoft.com>

Content-Type: text/plain; charset="utf-8"

You lot are generating a lot of mail on a last call topic. I hope that we are moving toward stability of the draft, not trying to rush in new features at the last minute.

Martin wrote:

> I hope we can restrict this to the a very short time.
> Everybody, please comment quickly...

I agree with those who say we should not allow two records having the same subtag and the same type.

I agree with the sentiment that a subtag should have a consistent semantic. Years are relatively easy cases as the concept of 'point in time at which some significant change in conventions occurred' is pretty straightforward. (Someone mentioned the possibility of two changes happening for a given language in a given year -- that just means that, in that case, more than four-digit subtag is needed.) Beyond that, there'd be a need to be wary of the kind of fuzziness Randy fears. A case like "northern" is a clear example of a problematic case: there is a trivial sense in which "northern" would always have a consistent semantic, but that would be a vague trap that would snare us unless there were some way to make exactly clear what specific semantic is meant for each collocation.

This reminds me of issues in lexicography, in which a word form may correspond to distinct lexemes (distinct meanings), or in which a single lexeme can have multiple senses: in either case, a hierarchical organization of information is typically used to show how information is related. So, instead of

> Type: variant
> Subtag: 1901
> Description: German orthography of 1901
> Description: Uzbek transliteration of 1901
> Prefix: de
> Prefix: uz
> Comments: Good luck figuring this out.

something along the lines of

Type: variant
Subtag: 1901
Sense:
   Description: German orthography of 1901
   Prefix: de
   Reference: ...
Sense:
   Description: Uzbek transliteration of 1901
   Prefix: uz
   Reference: ...

Some said that the first record above seems reasonably clear; I concede that -- in this case; but in the general case I think there are real risks of introducing ambiguity and lack of clarity into records.

But the kind of approach in the second record above would require a lot more change than I think should be introduced at this stage. I think it very important to limit ourselves to a useful change that we can get quick closure on. To that end, I suggest

- that we have a requirement on one record per subtag-type,
- that at this point we not block records like the first one above
- that there be a general requirement on a consistent semantic for all uses of a subtag
- that there be some general cautions about risks of confusion if the intent for each particular collocation is unclear

And beyond that, that for now we leave it to the IETF-Language list and LSTR to do the right thing -- so that we in LTRU can do the right thing for us at this point, which is to close on this discussion and on the WGLC.

Peter

------------------------------

Message: 4
Date: Fri, 3 Oct 2008 07:59:24 +0200
From: "Mark Davis" <mark at macchiato.com>
Subject: Re: [Ltru] Uniqueness of variant subtags
To: "Peter Constable" <petercon at microsoft.com>
Cc: LTRU Working Group <ltru at ietf.org>
Message-ID:
	<30b660a20810022259r3cd384fcpb48c89c4926e7e60 at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I think we might be in rough agreement. Can you suggest changes in the wording in http://docs.google.com/Doc?id=dfqr8rd5_216dbfzfkf5?
Mark

On Fri, Oct 3, 2008 at 7:53 AM, Peter Constable <petercon at microsoft.com>wrote:

> You lot are generating a lot of mail on a last call topic. I hope that 
> we are moving toward stability of the draft, not trying to rush in new 
> features at the last minute.
>
> Martin wrote:
>
> > I hope we can restrict this to the a very short time.
> > Everybody, please comment quickly...
>
> I agree with those who say we should not allow two records having the 
> same subtag and the same type.
>
> I agree with the sentiment that a subtag should have a consistent semantic.
> Years are relatively easy cases as the concept of 'point in time at 
> which some significant change in conventions occurred' is pretty straightforward.
> (Someone mentioned the possibility of two changes happening for a 
> given language in a given year -- that just means that, in that case, 
> more than four-digit subtag is needed.) Beyond that, there'd be a need 
> to be wary of the kind of fuzziness Randy fears. A case like 
> "northern" is a clear example of a problematic case: there is a 
> trivial sense in which "northern" would always have a consistent 
> semantic, but that would be a vague trap that would snare us unless 
> there were some way to make exactly clear what specific semantic is meant for each collocation.
>
> This reminds me of issues in lexicography, in which a word form may 
> correspond to distinct lexemes (distinct meanings), or in which a 
> single lexeme can have multiple senses: in either case, a hierarchical 
> organization of information is typically used to show how information 
> is related. So, instead of
>
> > Type: variant
> > Subtag: 1901
> > Description: German orthography of 1901
> > Description: Uzbek transliteration of 1901
> > Prefix: de
> > Prefix: uz
> > Comments: Good luck figuring this out.
>
> something along the lines of
>
> Type: variant
> Subtag: 1901
> Sense:
>   Description: German orthography of 1901
>   Prefix: de
>   Reference: ...
> Sense:
>   Description: Uzbek transliteration of 1901
>   Prefix: uz
>   Reference: ...
>
> Some said that the first record above seems reasonably clear; I 
> concede that -- in this case; but in the general case I think there 
> are real risks of introducing ambiguity and lack of clarity into records.
>
> But the kind of approach in the second record above would require a 
> lot more change than I think should be introduced at this stage. I 
> think it very important to limit ourselves to a useful change that we 
> can get quick closure on. To that end, I suggest
>
> - that we have a requirement on one record per subtag-type,
> - that at this point we not block records like the first one above
> - that there be a general requirement on a consistent semantic for all 
> uses of a subtag
> - that there be some general cautions about risks of confusion if the 
> intent for each particular collocation is unclear
>
> And beyond that, that for now we leave it to the IETF-Language list 
> and LSTR to do the right thing -- so that we in LTRU can do the right 
> thing for us at this point, which is to close on this discussion and on the WGLC.
>
>
>
>
> Peter
> _______________________________________________
> Ltru mailing list
> Ltru at ietf.org
> https://www.ietf.org/mailman/listinfo/ltru
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ietf.org/pipermail/ltru/attachments/20081003/7d71aef2/attachment-0001.htm>

------------------------------

Message: 5
Date: Fri, 03 Oct 2008 08:00:21 +0200
From: Leif Halvard Silli <lhs at malform.no>
Subject: Re: [Ltru] Uniqueness of variant subtags
To: John Cowan <cowan at ccil.org>
Cc: LTRU Working Group <ltru at ietf.org>,	Kent Karlsson
	<kent.karlsson14 at comhem.se>
Message-ID: <48E5B4F5.2090309 at malform.no>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

John Cowan 2008-10-03 07.33:

> Leif Halvard Silli scripsit:
> 
>> It is interesting to compare the prefixes allowed for "1996",
>> 
>> 	Prefix: de

    [ snip ]

> Not *allowed*, but *recommended*.  The first means that we say you SHOULD use 'de'
> (and implicitly, SHOULD NOT use any other primary language subtag) with '1996'.

But the question still is why the registry does not say which regions tags "1996" is recommended used for.
--
leif halvard silli

------------------------------

Message: 6
Date: Thu, 2 Oct 2008 23:02:01 -0700
From: Peter Constable <petercon at microsoft.com>
Subject: Re: [Ltru] Uniqueness of variant subtags
To: Randy Presuhn <randy_presuhn at mindspring.com>, LTRU Working Group
	<ltru at ietf.org>
Message-ID:
	<DDB6DE6E9D27DD478AE6D1BBBB835795633D71B9B5 at NA-EXMSG-C117.redmond.corp.microsoft.com>

Content-Type: text/plain; charset="us-ascii"

+1

-----Original Message-----
From: ltru-bounces at ietf.org [mailto:ltru-bounces at ietf.org] On Behalf Of Randy Presuhn
Sent: Thursday, October 02, 2008 10:37 AM
To: LTRU Working Group
Subject: Re: [Ltru] Uniqueness of variant subtags

Hi -

As a technical contributor...

I'm deeply concerned that this thread is leading to a seriously over-specified, over-engineered solution.  There are several different concerns that folks appear to be trying to address, and I'm afraid that the net effect of the proposals might be worse than the envisioned problems.

(1) making "institution" mnemonics part of the formal structure logically leads to the conclusion that there should be a registry of institutions.  I really don't want to go there!

(2) the "same meaning" vs "same semantics" debate gets to the heart of why I so strongly opposed using the same variant tag for the Hanyu Pinyin orthography of Mandarin and a pinyin-like way of writing Tibetan.  Our job is to define procedures to keep the registry running smoothly so that language tagging needs can be met.  How does each interpretation effect the operation of the review process and registry?  How would each impact the developer communities of products that employ language tags?  Would either approach make any difference to the people using those languages?

(3) With respect to the "year" business, at some point we have to stop gilding this lily with more and more advice and recognize that as more and more variant subtags are registered, it will become increasingly likely that the character string used for the variant will end up having little or no mnemonic value.  Tough.
It's unrealistic at this late stage in the process to come up with an architecture for the construction of variant subtags.  Doing so will only increases the amount of time spent arguing about the "right" string to use for a particular variant.  THEY ARE JUST SUBTAGS, and we should *NOT* do anything to foster the perception that they are anything more.

Randy

_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru

------------------------------

_______________________________________________
Ltru mailing list
Ltru at ietf.org
https://www.ietf.org/mailman/listinfo/ltru

End of Ltru Digest, Vol 44, Issue 15
************************************