Tatweel (and Lm and Tables Section 2.1 generally)

John C Klensin klensin at jck.com
Fri Mar 20 23:30:39 CET 2009


--On Friday, March 20, 2009 14:50 -0700 Kenneth Whistler
<kenw at sybase.com> wrote:

> Mark said:
> 
>> 4. The only reason I proposed Tatweel is that there is a
>> fundamental difference: we encoded Tatweel specifically for
>> its display effect, no other reason.
> 
> Let me elaborate somewhat on that.
> 
> Tatweel is *not* a letter. It is a stroke extension used in
> a cursive script as part of the mechanism for line
> justification. Essentially, it is a calligraphic convention.
> More properly, the justification technique is "kashida", and
> "tatweel" is a glyph used in this justification technique.
>...

Ken (and Mark),

Thanks for the explanations.  My apologies for continuing to ask
these sorts of questions, but I think the asking is necessary,
even when I'm moderately sure what that answers are going to be
(and satisfied with the answers).  

As far as the Lm -> CONTEXTO question is concerned, I believe
that you are right about not wanting to go there, if only
because it would open up cans of worms that are better kept
closed.  But my assumption was that we would make no effort to
define contextual rules for any character for which there was
not a clear and obvious requirement.  That would make the task
much smaller than the one you were apparently anticipating, even
if it would still not make it tractable.

>From my personal point of view, your explanation and
recommendation, Mark's recommendation, and the ASIWG
recommendation from last fall (in no particular order) are more
than sufficient grounds for moving this character to DISALLOWED
and taking its N'Ko counterpart with it.  Despite the lack of
convenient classifying properties, they are clearly not
"letters" in the informal sense in which we have tried to
interpret and extend that term.   I trust that, if more of these
come along, you will help us identify them so that appropriate
and consistent classifications can be made (even if only as
additions to the exception list).

I still wonder whether those characters at 02BA..02C1, or at
least the first one or two of them, should also be DISALLOWED,
but that is obviously a separate question.

And, again, my thanks for your patience in providing these
explanations.

     john

p.s. with the understanding that it should be a long-term
project at best and that I'd hate to see anything at all (and
certainly not the IDNA revision) blocked on it, I'd like to
softly suggest that, if someone ever has time, a supplemental
property that would sub-classify Lm would be helpful for the IDN
situation and maybe other situations like it.  It would seem to
me to be a little more satisfactory-sounding than "non-content,
cursive script line justification extending marks" (or similar
statements about other characters in Lm) if only because the
quoted text sounds like a judgment call, even though I now
understand that it is not.

Interestingly, and with the understanding that it is pure
speculation, the eventual addition of such a property to
Unicode, or some other property that would permit shortening the
exception list(s), feels to me like a much more plausible reason
for wanting to change the rules themselves in some future
version of IDNA than some of the examples that I, and others,
have offered.



More information about the Idna-update mailing list