Tamil Numerals in IDNA - Re: WG Last Call for Four Primary IDNABIS I-Ds

Fri Aug 21 19:04:39 CEST 2009

Dear John,
we definitly have two different perspective. You as a designer from
inside the network and me as a use from outside of the network. What
is important is that we can match our position as on an interface no
one is to win.

2009/8/21 John C Klensin <klensin at jck.com>:
> Elizabeth,
>
> It appears that you are viewing all requests or suggestions from
> language communities as equivalent to each other and,

Absolutely. This is the very basis of multilingualism.

> more
> important, as a situation in which "...different French or Tamil
> registries may adopt different mapping practices...".  In case
> there is still confusion, let me stress two things:
>
> (1) The specifications _forbid_ a registry doing _any_ mapping
> (converting one character or code sequence into another) other
> than
>
>        (i) testing for and, if necessary, applying Unicode
>        Normalization Form C (NFC)
>
>        (ii) Converting one Unicode encoding form (e.g., UTF-8)
>        into another (e.g., UTF-32) for consistency with
>        internal character storage.

you forget one which is :

          (iii) to correct a misuse of Unicode.
          Unicode is Unicode. It is not responsible of the way it
          is being used.

> Registrations occur in terms of final characters only.

I do not know what "final characters" is.
U-labels are expressed in Unicode-points and A-label in ASCII.
Registrations may occur in either or both forms.

> Of
> course, the IETF, as a voluntary standards body, cannot enforce
> that rule.  But it cannot enforce any other rule either.  Those
> who do not follow the standards may encounter interoperability
> problems and may be subject to other authorities.

Correct. The first authority is the user.
Our job is to try to prevent his/her interoperability problems.

> (2) The request from the Sri Lankan Tamil community asks for
> exclusion of a range of characters, i.e., classifying them as
> DISALLOWED.  If they are DISALLOWED, then no registry is
> permitted to register strings that contain them and no lookup
> application should look up any string that contains them either.
> That case produces absolutely predictable behavior.  It also
> does not violate any fundamental rules of Unicode or of the
> IDNA2008 protocol design -- an exception would just be made to
> treat some collection of characters as DISALLOWED that would
> otherwise be PVALID.
>
> What you are asking for where French is involved is a very
> different situation.

The discussion is over the IETF disallowing desire: refusing/forcing.

> First, you want certain characters treated
> differently for some languages that use a given script than for
> others that use the same script.  That is nearly impossible to
> think about, just because there is, in general, no way to know
> what language a particular label is supposed to be associated
> with, nor is there a way to know what top-level domain has the
> label in one of its subtrees (even if one could reliably
> associate top-level domains with languages).

French does not use Unicode. Unicode supports French.

>  Second, if I
> understand your latest note correctly, you would like to have
> those characters treated via some contextual rule ("CONTEXTO").
> But the contextual rules yield either "valid" or "invalid" based
> on adjacent or nearby characters -- they do not provide
> different mappings, nor different rules for different languages
> (the latter at least partially for the reasons above).

We just speak of scripts and their correct semantic usage
(orthotypography). When other characters are Tamil don't we know we
are in a Tamil context?

> And,
> finally, your suggestion requires treating capital letters (or
> at least some capital letters) as distinct from their lower-case
> forms, which would create massive inconsistencies with IDNA2003
> (not just the two characters of inconsistency with which we have
> have had such extensive debates) as well as inconsistencies with
> DNS and host table practices that go back to the 1970s.  No
> matter how strong your justification, and even if it were not
> also tied to differential treatment for a particular language, I
> cannot imagine the WG (or the IETF more broadly) agreeing to
> that change.
>
> Another part of the difference is that the Tamil script is used
> to write only one language or, depending on how one counts, a
> small collection of very closely related languages.

Our point is simple. Future is to idolects (support, authentication,
etc.) and personalized presentations. We do not want anything that
might prevent or delay innovations in that area.

> That makes
> thinking about an exception request much easier than it is with
> the Latin script, which is used to write a very large number of
> languages, some of them with no recent (e.g., conservatively in
> the last 3000 years or so) linguistic relationship to each other
> and that use the script in different ways.  That is a
> long-standing historical problem; there is nothing that the WG
> can do about it today other than to recognize it and move on.

You will note that this is roughly what we advise.
Best.

Elisabeth Blanconil.