What's cooking?

Tex Texin tex at xencraft.com
Fri Oct 7 10:56:32 CEST 2005


Thanks for this. 
What I meant by my remark about rationalization of 3066bis continuing, is
that the answer to my question seemed to take as a premise that it should
result in a formulation that matches the 3066bis proposal and so sounded
contrived. (Sorry Peter. I doubt that was what you were intending or doing,
I was just saying there should be a more compelling response.)

Whether 3066bis is cooked or not, this group seems set on going forward with
it. We can all agree it has been a long haul. 

The real test is whether the industry and standards that supported rfc
1766/3066 will see 3066bis as a beneficial successor and go forward with it.
It also depends on when it becomes complete, the matching draft needs to be
there as well and the new registry and some sense of versioning or stability
for the registry must occur. And potentially the relationship to locales
needs to be broached.

As for whether or not I would reject pinyin over wade-giles, that depends on
the application- If it is me reading the document, I don't care. If my
text-to-voice reader, search, machine translation, or karaoke engines don't
support one or the other perhaps I do need to reject.

I am fine with the philosophy of letting people propose tags that they need
rather than wrestle with hypotheticals.

We have had a few people say that they saw a need during the thread. I am
just asking whether defining transliteration tags as variants is adequate,
as was proposed. Some of the discussion indicated (if I understood
correctly) that transliteration tags might in fact be better placed higher
up in the tag hierarchy. It could be argued that maybe that would be nice,
but they can go as variants and the hierarchy can be dealt with by the
matching algorithm. That's not an unfair response but adds complexity and
run-time costs.

Your comments on scripts suggested to me that you thought I might be
disagreeing with the addition of script tags.
We definitely needed at least some script tags. I am not a fan of where they
are placed because of its impact on backward compatibility and the need for
a complex matching algorithm. But of course we needed scripts for at least
some languages.

Others seem fine with the choices of 3066bis. I have a number of concerns
and have difficulty recommending support for it.

We should have either generativity or a registry. Having both seems wasteful
and makes the requirements for registering or rejecting tags unclear or
unnecessary. And it introduces versioning problems. (Which version of the
registry is your software on, and what happens when we have a steady stream
of "register the new tag I generated" requests causing abundant and frequent
version changes?)

We could have either position-based subtags or size-based (character length)
subtags. We have both which also seems wasteful.
For example, we could let script float in any position and simply agree any
4-letter subtag is a script tag. (There is a potential backward
compatibility problem, but this one is less of a problem than locking script
in second place.)

We could have fixed some of the complexity of tags by adopting specific
separator characters, but we insisted on keeping hyphen for compatibility
and then broke compatibility anyway.

We talk about the importance of stability and yet we have a steady stream of
deprecations and renaming.

I keep asking myself whether we are that much significantly better off than
if we had simply registered the few tags we needed that indicate script in
addition to language, and for super-regions that make sense for our industry
(such as es-419) than to try to move our industry to this more complex
scheme that generates all sorts of tags many of which I speculate are not
needed, but may cost me in table size, translation costs for the large
number of tags, testing costs, etc..

Sorry, I see a half-empty glass and the water also seems kind of mucky...

Harald Tveit Alvestrand wrote:
> --On onsdag, oktober 05, 2005 12:24:35 -0700 Tex Texin <tex at yahoo-inc.com>
> wrote:
> > Guys, sorry to be the odd man out yet again, but we should first run
> > through all the use cases before deciding that transliteration can be
> > pushed down the stack. This argument sounds to me more like a
> > rationalization for continuing with 3066bis than to really address the
> > question.
> I certainly hope that 3066bis is cooked now.... no matter what else, it's
> time to drive a stake in the ground and say "here's the starting point for
> further work".... it's been long enough (langtags-00: December 2003).
> > Text to voice is important for accessibility. Identification of the
> > transliteration scheme would be a prominent requirement and perhaps
> > therefore ru-Latn is not sufficient and should not be recommended as
> > adequate.
> You raise an interesting point, which is actually pertinent to LTRU's
> remaining deliverable - the matching draft.
> When you have a specific document in a transliterated format that you want
> to read through text-to-speech, you need to know (or guess) what the
> transliteration scheme is. But in searching for documents, it's less
> obvious that you want to specify this information before knowing what's
> available; when reading up on Chinese history sitting at an ASCII terminal,
> would you reject a document transcribed into Wade-Giles if there's no
> Pinyin version available?
> But (this is a matter of 1766-era philosophy) one of the reasons why I
> designed 1766 the way it was designed was to give people the freedom to
> "put their money where their mouth is" - if they think a certain tag or tag
> combination is needed, let them go through the work of deciding exactly
> what they need, documenting that to this list and defending it - and then
> using it, and showing the world that usage will happen. Designs based on
> hypothetical needs is less likely to succeed than designs based on
> experience - and the experience with 1766/3066 led to the community
> deciding that generativity was good and script belonged in language tags -
> 3066bis.
> If someone thinks they (and not some abstract "someone") need
> transliteration identifiers in language tags, let them propose them. And
> then we can try them.
>                      Harald
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages

Tex Texin   cell: +1 781 789 1898   mailto:Tex at XenCraft.com
Xen Master                          http://www.i18nGuy.com
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World

More information about the Ietf-languages mailing list