New version: draft-ietf-idna-tables-01.txt

John C Klensin klensin at
Wed May 7 00:36:17 CEST 2008

--On Monday, 05 May, 2008 18:16 -0400 Vint Cerf
<vint at> wrote:

> I do not believe we had consensus on the historic scripts -
> just a discussion.
> There seem to be more than ample ways to advertise the
> existence of texts using these scripts without the need to
> instantiate the scripts in DNS.


Let me suggest a different theory:

(1) The letters and digits of the historic scripts are not, in
any way, less letters and digits than those of scripts that are
more actively used.  There doesn't seem to be any disagreement
about that.

(2) As a group, the characters of the historic scripts are no
more likely to cause serious confusion or descriptive problems
than the letters and digits of more actively used scripts.  I
don't believe there is any disagreement about that either.

(3) Scripts and languages are classified as "historical" or
"archaic" using criteria for which there is little consensus in
the larger community (e.g., I suspect there are difference of
opinion between parts of the linguistic community and parts of
the cultural preservation one).  If one classifies on the basis
of number of living primary-language speakers, one gets one
list.  If one does so on the basis of a count of
primary-language native speakers within some recent period of
time, one gets different lists... and arguments about what
period of time should be used.  If one adds "who are also
literate in the written form of the language", then one gets yet
other lists.  If one evaluates IDN-appropriateness on the basis
of how many people use the script on a daily basis today (with
"use" being reading and/or writing), then several of those
archaic scripts have significant more users than some
contemporary ones.   

Worse for our purposes, some scripts that were clearly of only
historical interest a decade or two ago are being resurrected
and taught in schools.  They are probably still a curiosity
today, but some would predict that they would become significant
enough in another decade or so to require reclassifying them
(remembering that reclassification from DISALLOWED to
Protocol-Valid is going to be more or less a big deal that
should be avoided if possible.

I also don't see making an exclusion of "archaic scripts in
Plane 1".  While I don't personally expect any of the scripts
that are there now to be used in many IDNs, I'm also looking
toward the future.  In that future, I don't see room in the BMP
for even one script with more than a few handfuls of characters
in it (if I interpret the Unicode 5.1 tables correctly, there is
only one block of about 260 characters left, probably only 255
after allowances for block integrity.  Even one large-ish script
and there will be no choice but to use Plane 1 space.

To me, what this adds up to is that...

	(i) A restriction on historic or archaic scripts will
	require us to make another rule that we don't otherwise
	need, a rule that is based on blocks or enumerated
	script names, not the properties we are otherwise using.
	Keeping things as simple as possible argues that we
	should have as many rules as we need, but no more.  And
	I don't think we need this one.
	(ii) A restriction on historic or archaic scripts is
	likely to embroil us in arguments with scholarly,
	research, and cultural preservation and reconstruction
	communities that we don't really need to have unless
	there are substantive benefits to be gained from
	excluding these characters at the protocol level.  And
	there are no such benefits.

	(iii) Imposing this restriction and disallowing these
	scripts an the characters they contain raises the odds
	of ever having to move a significant number of
	characters from DISALLOWED to PVALID.   It is very much
	in our interest to keep the number of those cases, and
	the odds of finding them, as few as possible, whether
	one adopts a more restrictive or more liberal view of
	what it takes to make the move.

Now, in my mental list of "advice I would give zone
administrators who were interested in my advice", the very first
one on the list is "don't register labels that contain
characters from any script you don't understand".  An obvious
corollary to that would be registry restrictions banning the use
of any of these scripts unless the zone actively serviced people
doing work in/ with specific ones of them.  I would expect that
the number of such zones would be very small.  But I don't think
the case has been made for banning these letters and numbers.


More information about the Idna-update mailing list