Transliteration [Re: ADMIN: Please move (Re: What's cooking?)]

John Clews scripts20 at
Fri Oct 7 11:28:28 CEST 2005


I agree with you that

>>> I certainly hope that 3066bis is cooked now.... no matter what else,
>>> it's
>>> time to drive a stake in the ground and say "here's the starting point
>>> for further work".... it's been long enough (langtags-00: December
>>> 2003).

Yes - let's see RFC 3066bis approved and published: stability is essential.

However, just as RFC 1766 was replaced by RFC 3066, which will be replaced
by RFC 3066bis, the world will move on after that too, and there will
almost certtainly in due course be a RFC 3066bis-bis". The transliteration
issues raised certainly fall into that category.

I'm not a member of the LTRU list, and certainly I see what has been
raised by Tex Texin and others as valid matters of discussion.

I see your difficulty when you are trying to see RFC 3066bis "cooked" as
you put it.

I think that this could be resolved by discussions on these matters having
"Transliteration" as the first word of the Subject: field: those that
ignored it could do so: those that wanted to focus on it, without it
affecting RFC 3066bis could also do so.

John Clews

> as sort-of-list-administrator:
> These are (mostly) LTRU issues, not language tag registration issues.
> Please take those to the LTRU list.
> (Yes, I know that you'll be told "these are closed issues and you don't
> have a good reason for reopening them". Yes, I'm going to follow my own
> advice and shut up here on them....)
>                      Harald
> --On fredag, oktober 07, 2005 01:56:32 -0700 Tex Texin <tex at>
> wrote:
>> Harald,
>> Thanks for this.
>> What I meant by my remark about rationalization of 3066bis continuing,
>> is
>> that the answer to my question seemed to take as a premise that it
>> should
>> result in a formulation that matches the 3066bis proposal and so sounded
>> contrived. (Sorry Peter. I doubt that was what you were intending or
>> doing, I was just saying there should be a more compelling response.)
>> Whether 3066bis is cooked or not, this group seems set on going forward
>> with it. We can all agree it has been a long haul.
>> The real test is whether the industry and standards that supported rfc
>> 1766/3066 will see 3066bis as a beneficial successor and go forward with
>> it. It also depends on when it becomes complete, the matching draft
>> needs
>> to be there as well and the new registry and some sense of versioning or
>> stability for the registry must occur. And potentially the relationship
>> to locales needs to be broached.
>> As for whether or not I would reject pinyin over wade-giles, that
>> depends
>> on the application- If it is me reading the document, I don't care. If
>> my
>> text-to-voice reader, search, machine translation, or karaoke engines
>> don't support one or the other perhaps I do need to reject.
>> I am fine with the philosophy of letting people propose tags that they
>> need rather than wrestle with hypotheticals.
>> We have had a few people say that they saw a need during the thread. I
>> am
>> just asking whether defining transliteration tags as variants is
>> adequate,
>> as was proposed. Some of the discussion indicated (if I understood
>> correctly) that transliteration tags might in fact be better placed
>> higher
>> up in the tag hierarchy. It could be argued that maybe that would be
>> nice,
>> but they can go as variants and the hierarchy can be dealt with by the
>> matching algorithm. That's not an unfair response but adds complexity
>> and
>> run-time costs.
>> Your comments on scripts suggested to me that you thought I might be
>> disagreeing with the addition of script tags.
>> We definitely needed at least some script tags. I am not a fan of where
>> they are placed because of its impact on backward compatibility and the
>> need for a complex matching algorithm. But of course we needed scripts
>> for at least some languages.
>> Others seem fine with the choices of 3066bis. I have a number of
>> concerns
>> and have difficulty recommending support for it.
>> We should have either generativity or a registry. Having both seems
>> wasteful and makes the requirements for registering or rejecting tags
>> unclear or unnecessary. And it introduces versioning problems. (Which
>> version of the registry is your software on, and what happens when we
>> have a steady stream of "register the new tag I generated" requests
>> causing abundant and frequent version changes?)
>> We could have either position-based subtags or size-based (character
>> length) subtags. We have both which also seems wasteful.
>> For example, we could let script float in any position and simply agree
>> any 4-letter subtag is a script tag. (There is a potential backward
>> compatibility problem, but this one is less of a problem than locking
>> script in second place.)
>> We could have fixed some of the complexity of tags by adopting specific
>> separator characters, but we insisted on keeping hyphen for
>> compatibility
>> and then broke compatibility anyway.
>> We talk about the importance of stability and yet we have a steady
>> stream
>> of deprecations and renaming.
>> I keep asking myself whether we are that much significantly better off
>> than if we had simply registered the few tags we needed that indicate
>> script in addition to language, and for super-regions that make sense
>> for
>> our industry (such as es-419) than to try to move our industry to this
>> more complex scheme that generates all sorts of tags many of which I
>> speculate are not needed, but may cost me in table size, translation
>> costs for the large number of tags, testing costs, etc..
>> Sorry, I see a half-empty glass and the water also seems kind of
>> mucky...
>> tex
>> Harald Tveit Alvestrand wrote:
>>> --On onsdag, oktober 05, 2005 12:24:35 -0700 Tex Texin
>>> <tex at> wrote:
>>> > Guys, sorry to be the odd man out yet again, but we should first run
>>> > through all the use cases before deciding that transliteration can be
>>> > pushed down the stack. This argument sounds to me more like a
>>> > rationalization for continuing with 3066bis than to really address
>>> the
>>> > question.
>>> I certainly hope that 3066bis is cooked now.... no matter what else,
>>> it's
>>> time to drive a stake in the ground and say "here's the starting point
>>> for further work".... it's been long enough (langtags-00: December
>>> 2003).
>>> > Text to voice is important for accessibility. Identification of the
>>> > transliteration scheme would be a prominent requirement and perhaps
>>> > therefore ru-Latn is not sufficient and should not be recommended as
>>> > adequate.
>>> You raise an interesting point, which is actually pertinent to LTRU's
>>> remaining deliverable - the matching draft.
>>> When you have a specific document in a transliterated format that you
>>> want to read through text-to-speech, you need to know (or guess) what
>>> the
>>> transliteration scheme is. But in searching for documents, it's less
>>> obvious that you want to specify this information before knowing what's
>>> available; when reading up on Chinese history sitting at an ASCII
>>> terminal, would you reject a document transcribed into Wade-Giles if
>>> there's no Pinyin version available?
>>> But (this is a matter of 1766-era philosophy) one of the reasons why I
>>> designed 1766 the way it was designed was to give people the freedom to
>>> "put their money where their mouth is" - if they think a certain tag or
>>> tag combination is needed, let them go through the work of deciding
>>> exactly what they need, documenting that to this list and defending it
>>> -
>>> and then using it, and showing the world that usage will happen.
>>> Designs
>>> based on hypothetical needs is less likely to succeed than designs
>>> based
>>> on experience - and the experience with 1766/3066 led to the community
>>> deciding that generativity was good and script belonged in language
>>> tags
>>> - 3066bis.
>>> If someone thinks they (and not some abstract "someone") need
>>> transliteration identifiers in language tags, let them propose them.
>>> And
>>> then we can try them.
>>>                      Harald
>>> _______________________________________________
>>> Ietf-languages mailing list
>>> Ietf-languages at
>> --
>> -------------------------------------------------------------
>> Tex Texin   cell: +1 781 789 1898   mailto:Tex at
>> Xen Master                
>> XenCraft		  
>> Making e-Business Work Around the World
>> -------------------------------------------------------------
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at

More information about the Ietf-languages mailing list