baku1926

Thu May 3 08:09:17 CEST 2007

Reshat Sabiq (Reşat) <tatar dot iqtelif dot i18n at gmail dot com> 
wrote:

> I don't find the Comments entry that was submitted to be well worded. 
> I can agree w/ the 2nd sentenceö for the most partö for brevıty, but 
> the first one is doesn't look right.

I apologize if Reşat or others are dissatisfied with the Comments field 
for this subtag.  I was largely responsible for trying to reduce the 
length and 2-dimensional structure from Reşat's proposed:

> Comments: Denotes alphabet/orthography used in Turkic 
> republics/regions of the former USSR in late 1920s, and throughout 
> 1930s, which aimed at representing equivalent phonemes in a unified 
> fashion. Other names:
> a. New Turkic Alphabet
> b. Birlәşdirilmiş Jeni Türk Әlifbasь (in Azeri)
> c. Jaŋalif (in Qazan Tatar: abbreviation of "New Alphabet").

I had hoped the shorter version Michael and I agreed on was mostly 
equivalent to this.  I can see I'm fighting a losing battle in trying to 
keep the Comments fields down to a clause or two, as had been the 
tradition for registered subtags in the past.  (See, for instance, the 
Comments field attached to the 'boont' subtag, which makes no attempt to 
document the history and circumstances of the invention of Boontling.)

Now that we are sending the registration form as well as the new record 
to IANA and to the list (did anyone notice we did that for 'tarask'?), 
there will be greater public review and I probably won't be able to do 
anything about the length of comments, but I also hope the archiving of 
the forms will encourage proposers to put lengthy explanations and 
bibliographic references there and not in the Registry.

> 1) It mentions only 1930s, which could mislead or confuse some people.

I had not thought the difference between "in late 1920s, and throughout 
1930s" and "in the 1930s" would be substantial enough to cause genuine 
confusion.

> 2) It has a semicolon after Jeni which breaks a single correct name 
> into two incorrect names, if i understand ;'s role as a delimiter in 
> this thing.

I'm pretty sure the extra semicolon was a typo.

> 3) I also suggest changing ';' as a punctuation sign in names list to 
> ','.
>
> I'd appreciate feedback on possibility of changing the Comments from:
> Latin orthography used in the Soviet Union in the 1930s for writing 
> Turkic languages. Also called New Turkic Alphabet; 
> Birl&#x4D9;&#x15F;dirilmi&#x15F; Jeni; T&#xFC;rk &#x4D8;lifbas&#x44C;; 
> or Ja&#x14B;alif.
>
> to:
> Denotes alphabet used in Turkic republics/regions of the former USSR 
> in late 1920s, and throughout 1930s, which aspired to represent 
> equivalent phonemes in a unified fashion. Also known as: New Turkic 
> Alphabet, Birl&#x4D9;&#x15F;dirilmi&#x15F; Jeni T&#xFC;rk 
> &#x4D8;lifbas&#x44C;, Ja&#x14B;alif.

(Reşat later changed the spelling from of 'Türk' to 'Tyrk' for reasons 
that are not clear to me.)

I am not opposed to these changes, especially the one involving the 
semicolon which incorrectly breaks up one of the names of the 
orthography.  Reşat's proposed change is not dramatically longer than 
the one currently registered.  I would suggest that we review this one 
carefully, register the Right Thing, and then put it to rest, and not 
let ourselves get into a pattern of micro-analyzing every Comments field 
that appears in future registrations.

> 4) Lastly, I believe there is no dispute about the following being 
> true for this subtag, and yet it is not so indicated, as i suggested 
> in 
> http://www.alvestrand.no/pipermail/ietf-languages/2007-April/006397.html:
> Suppress-Script: Latn

Section 3.1 of RFC 4646 states clearly:

"The field 'Suppress-Script' MUST only appear in records whose 'Type' 
field-value is 'language'."

Suppress-Script values are not added to variant subtags.  If you feel 
this should be changed, please join the LTRU Working Group list (link 
available at bottom of this message) and discuss it there.  This group 
is not empowered to change RFC 4646.

Michael Everson <everson at evertype dot com> replied:

> I am not really very happy about tinkering so soon after registration. 
> But if we do change it I would like to get rid of the illegible &xxxx; 
> notation. If the registry entries are to be in HTML, they should be so 
> normatively, with charset tagging so that they display properly. If 
> they are not tagged, then ASCII fallback should be used so the strings 
> are legible. As it is I can only guess, or drag out the Unicode book 
> and look them up. That's not legibility.

RFC 4646 specifies that non-ASCII characters be represented using these 
ugly hex NCRs.  There are pros and cons to using UTF-8 for the Registry, 
and even though I an a huge fan of UTF-8, there are valid points to be 
made on both sides (i.e. many e-mail systems, even today, mangle UTF-8). 
Again, this list is not empowered to disregard or overturn what RFC 4646 
says.  We have had the debate in LTRU probably three or four times now, 
and the hex NCRs appear likely to stay.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages