[Idna-arabicscript] mapping of Full Stops

Mon Oct 12 17:13:46 CEST 2009

--On Monday, October 12, 2009 07:22 -0700 Paul Hoffman
<phoffman at imc.org> wrote:

> At 9:53 AM -0400 10/12/09, John C Klensin wrote:
>> But decoupling it from the
>> current Mapping document seems to be to appropriate, if only
>> because of the expandability and open-endedness of the list of
>> characters.
> 
> Sarmad's message did not seem to be a request to add all full
> stops to draft-ietf-idnabis-mappings, just U+06D4. (Sarmad,
> please correct me if I'm wrong.) The list that is given in the
> draft is clearly labelled as examples of full stops that a UI
> implementer might consider, not as the full list. The reason
> that the list exists at all is because of input from experts
> in a particular script; it seems reasonable to take input from
> experts on other scripts as well. If we get bombarded with
> experts from more than half a dozen languages before the end
> of IETF Last Call, I could see your view, but this is a single
> request from someone in a language community that has been
> part of the IDNAbis process for quite some time.

First of all, if there is going to be a list in the document, I
think Sarmad's request (which has been supported by other
analyses, by the way) is a completely reasonable one and that it
should be added to the list.  I am, of course, not an expert on
either Urdu or Arabic script usage generally, but the consensus
among those who are seems to be that adding this is at least as
strongly justified as the East Asian characters we have and, at
worst, harmless (whether you believe Sarmad, or me, about that
or whether you seek other experts is up to you... and Vint).

One can generalize from the "harmless" part: it looks like all
of the likely possibilities are going to be Po characters and
hence DISALLOWED.  As long as that relationship holds, and as
long as they don't leak, they are all harmless.  If they leak,
we get into a whole new family of visual confusability issues
and questions, but it seems to me that "won't leak" is a fairly
basic assumption of the mapping document.

We discussed label-separator mappings (including, if I recall,
U+06D4) much earlier in the life of the WG.  What I think
Sarmad's request points to was part of that earlier discussion:
the East Asian character list is not a complete list of all
reasonable "full stop" characters that might be justified and
recommended for mapping as label separators.  That pointer is
independent of the specifics of U+06D4, whether he intended it
or not.

> Before I consider adding U+06D4, I would want to hear from
> additional experts, but other than that, I see no danger in
> adding another character to an optional list in an optional
> document. 

I don't either (see above about "harmless"), except that I think
it would be very unfortunate to have to reopen and re-review
this document in six months or a year when someone argues that
one or two of U+0589, U+1632, U+166E (a special case because I'm
sure Eric can advise us as to whether that request would be
likely to arise and be plausible), U+1803, and so on should be
added to the list for the same types of reasons as the ones
Sarmad describes.

So I guess I'm making a recommendation about something I should
have spotted and made the recommendation about long ago: create
a registry for these things.  The reason for that is precisely
to prevent our needing to reopen the mapping document to add one
or more of these characters, not to treat them with more or less
authority.  I think such additions are extremely probable,
either as new scripts are added to Unicode or as communities
that are now unrepresented in this WG and probably
underrepresented on the Internet show up and explain their
needs.  Of course, YMMD on that likelihood assessment.

My suggestion about careful explanation and cautions is separate
but, if we preserve the current model for separation of
materials, if the registry is created by the mapping document,
the explanation would need to be in Rationale anyway.

    john