New version, draft-faltstrom-idnabis-tables-02.txt,
John C Klensin
klensin at jck.com
Thu Jun 14 12:04:15 CEST 2007
--On Thursday, June 14, 2007 10:24 +0100 Gervase Markham
<gerv at mozilla.org> wrote:
> Harald Alvestrand wrote:
>> Can you recommend specific scripts that you think should have
>> the "Stable" status?
> No; it's not my area of expertise. I comment merely as an
> implementor, for whom the current list looks concerning.
>> The fact that the CJK scripts are in MAYBE YES is probably
>> the biggest contributor to the sheer number of characters
>> there. But I have no idea whether there are known issues
>> with them that should be solved first.
> Forgive my ignorance, but isn't this what RFC 3743 addresses?
The principle here is extreme caution, at least in the very near
term. The one thing we must not do if we are going to have any
long-term stability at all is to put something into the "ok,
permitted" category and then learn something significant and
drop it back into "maybe" or "never". There isn't any room
for a "never mind" or "whoops" category.
My own guess about CJK, as a close observer of the processes,
work, and thinking that produced RFCs 3743 and 4713, is that if
the model of 3743 is followed using the tables that China
explained and published with 4713 and the Japanese and Korean
counterparts to those tables are used, CJK are perfectly safe.
But that raises three problems:
(i) like you, "we" are not willing to stand up and say that we
understand those languages and the script well enough to say
"this is sufficiently defined and ok". We don't have the
language expertise and know and hence need that assertion to
come from the relevant community after it has checked the 3743
model and tables against the new IDNA[bis] model.
(ii) at present, Patrik's table-generating principles are trying
to work with CJK as a single script because that is the only
handle that the Unicode properties and organizational structure
provide. RFC 3743 is a registration-side overlay that implies
subsets of that script -- different subsets for Chinese,
Japanese, and Korean. We haven't figured out how to talk about
that yet, e.g., to say "CJK is just fine as long as one follows
the 3743 model, knows the language context at registration time,
and has appropriate (linguistically and in terms of
minimal-confusion) subset tables for use with 3743, but more
general use of CJK is still 'maybe' at best".
I think such a statement is probably true, but am not expert
enough to assert it (see (i), above), and, as I said, we haven't
figured out how to assert that sort of subset/ conditional rule
yet. Perhaps we just need to say that, while a complete script
might be "maybe", carefully-thought-out subsets of that script,
handled so that confusables are controlled, might be just fine.
Your opinion (and that of others) on that would be welcome.
(iii) It is also important to understand these categories in
terms of how they are used. From your standpoint as an
implementer of a browser (or other applications that look up
these names) there is probably no practical difference between
"permitted", "maybe yes", and "maybe no". You are expected to
look up a string containing any of those characters to see if it
resolves. You might want to use the categories as indicators of
strings that you want to alert the user about in some way, but
they should not affect what you look up. Registries are
expected to stay much closer to the "permitted" list. See my
note on this list of Tuesday, June 12, 2007 18:01 -0400 for more
on this; I won't repeat it here but will happily forward you a
copy if it slipped past you.
More information about the Idna-update