Archaic scripts -- the Battle of Examples
kenw at sybase.com
Fri May 9 22:37:18 CEST 2008
> > IMO, inclusion of cuneiform (and many other long-dead,
> > ancient scripts for archaic writing systems) in IDNs is
> > just silly.
> We are having a battle of examples. You (and Michel) are
> picking examples that no one expects to see in IDNs and that, on
> a script by script basis, no one cares about seeing in IDNs.
> Cuneiform scripts or Linear-B (or, for that matter, Linear-A)
> are clearly in that category.
Then I fail to see why we are having this argument. If
no one cares about seeing these in IDNs, then why have
we withdrawn the historic script exclusion clause from
the table derivation?
If people here want to make cases that *some* of the historic
scripts that were listed in that clause (and in Table 4
of UAX #31) are more problematical and that there is reason
to *include* them in the listing of PVALID code points,
then let's have that discussion, instead.
> Maybe Runic --also probably more
> suitable for incisions into wood or stone than pens, brushes,
> and computer typesetting-- falls into that category too, or
> maybe it doesn't (see Cary's note of some weeks ago).
O.k., there's an appropriate edge case to discuss. In or
out? The UAX #31 recommendation is not to include runes
in identifiers. But if there is a case to be made for
runes in domain names, then we could certainly specify
that they be PVALID in the table.
> systems that, at least in the minds of their contemporary
> advocates and would-be restorers, were perfectly fine until
> forcibly replaced by those of conquers (whether military or
> religious) in relatively more historic times (8th - 10th century
> forward in Europe, Africa, and the Western Hemisphere, but not
> "thousands of years ago").
But there I think you are heading off into historical perceptions
that can't be reliably turned into anything actionable
in terms of protocol design here.
What matters, I think, is what contemporary communities
*are* using or might reasonably be inferred to want to use
if available, for domain names -- not what writing systems
ceased to be used hundreds (not thousands) of years ago,
and who conquered whom to cause such changes.
> One of the differences (possibly a useful one, possibly not), is
> that, for some of these scripts, we can identify modern
> languages that, with great confidence, are very closely related
> to the languages written in those classic/historical scripts.
I don't think that is a useful distinction. We know what
modern languages are descendent from the ancient Egyptian
that was written with hieroglyphics, from the Mycenean
Greek that was written with Linear B, from the Old Persian
that was written with a cuneiform alphabet. We know who
identifies with them culturally. None of that actually
makes those scripts appropriate for use in modern domain
> Whatever the scripts may be, the languages are certainly not
The descendant languages are not extinct. The classic languages
> For Linear-B and at least several of the Cuneiform
> scripts, we've got good guesses at the language family, but,
> however effectively we have been able to decode the scripts (or
> not), no one has heard the original language spoken for
> thousands of years and knowledge of what it sounds (or sounded)
> like is, in most cases, a matter of educated guesses.
Akkadian and Sumerian, yes -- those are dead. Persian, no.
> To take a particularly challenging example, Classic Mayan is now
> being taught to non-specialists and taught in primary and
> secondary schools, not just advanced post-secondary courses.
> That makes it very different, at least IMO, from scholars
> writing Sumero-Akkadian. It would have been impossible to teach
> it that way thirty or forty years ago because the understanding
> of the writing system wasn't there.
True enough. It has the advantage now of being mostly
deciphered, and the languages involved are close enough
to some of the modern Mayan languages that there is a
modern anchor to tie it to.
> It is, I believe, being
> taught more as a curiosity and artistic exercise than as a
> primary script, but there are people who believe it should and
> will come back as the primary script for the relevant languages.
> Perhaps they are deluded. But I'm not quibbling; I just do not
> know where to draw the line that seems so clear to you.
> I don't expect to see Classic Mayan used in everyday
> correspondence, just because of its intricacy,
Well, exactly. At this point in history, it stands no
real chance of serious revival as a modern writing system.
Its function may be as cultural emblem, but all modern
Mayan languages are written now in the Latin script -- which
is simply much, much more efficient in simple representation
of textual content.
> but, as you know,
> several serious people made the claim within the last century or
> two that Han was obsolete and an impediment and needed to be
> replaced for the same reason.
Different reasons, actually. And with the enormous difference
that Han logosyllabics have a vast, uninterrupted use,
carrying forward through all the modern typographical
revolutions due to printing and digital typesetting.
> Some of these issues are in the
> eye of the beholder. I don't know how those African writing
> systems that are the subjects of study and restoration attempts
> to push out colonial influences will fare, but their advocates
> have not at all confused preservation of the writing system with
> preservation of the language in the way that you suggest. And,
> while almost all of those languages can be transliterated into
> and, in your words, "written just fine with existing modern
> scripts", that isn't the point, any more than the ability to
> write Chinese in pinyin implies that we don't need Han in IDNs.
*What* African writing systems are you talking about?
Ethiopic is modern use and recommended for identifiers.
Tifinagh is modern use and recommended for identifiers.
N'Ko is modern use and recommended for identifiers.
Vai is modern use and recommended for identifiers.
Bamum isn't encoded yet, because stakeholders in the Cameroons
still wish to provide feedback before encoding, but when
it is encoded, it will surely also be recommended for identifiers.
And then, of course, there are Latin and Arabic, the most
widespread African scripts.
That leaves, as far as I can tell, Coptic, Osmanya, and
Egyptian Hieroglyphics. After that, you're starting to talk
about truly obscure systems like Woleai, which don't even
have decent encoding proposals started yet.
I can see an argument being made for Coptic, even though its
only contemporary use is liturgical, or for Osmanya, even
though even in Somalia it has no significant contemporary
community of usage and is unlikely to succeed in widespread
usage in the face of Arabic and Latin.
So where, *exactly*, is the problem here?
> We need Han/CJK in IDNs because there are significant
> populations that use it, not because the language could not be
> written in any other way.
True. And frankly, the Table 4 list in UAX #31 was built
pretty much on the criterion.
> If we assume that there are some historical/archaic scripts that
> are irrelevant to IDNs and certain to remain so (and we do agree
> on that),
> my problem is that I'm trying to find a reasonable
> differentiating rule that separates the ones that may be
> relevant from the ones that clearly are not.
You've just provided such a criterion: significant populations
that use it. Augment that with caveats that "use it" doesn't
refer merely to meta-uses by scholars teaching *about*
historic scripts, and you are pretty close to a reasonable
> If Classic Mayan
> ever ends up in Unicode (I predict that it will do so in the
> next decade or two if current trends continue despite a whole
> series of interesting problems), then any discrimination based
> on Plane 0 / Plane 1 distinctions will fail because it won't fit
> in the remaining space in Plane 0.
Actually, they don't. I'm claiming that Classic Mayan is
inappropriate in any case. If it does make it into Unicode
as an encoded script -- and 10 years from now is a reasonable
estimate, based on the state of knowledge about the script
and based on the fact that no proposal has even been mooted
yet, of any type -- it is my contention that it will be
*less* appropriate as a script to be used in identifiers
than cuneiform or Egyptian hieroglyphics.
> Again, I don't know about
> the African examples or others (and perhaps all of them are
> purely pictographic -- although I'd be surprised),
No, of course they aren't. See above. Most are syllabic or
> but I suspect
> that some of them will end up being used (and used in more than
> illustrations in historical articles) on the Internet as well.
Yes: Tifinagh certainly, because of Moroccan support for it.
But nobody is trying to restrict N'Ko, Vai, or Bamum either.
I don't understand what the problem is here, or why you
are even veering off into this discussion of African
> I may be misunderstanding what you and Michel are trying to say,
> but it feels as if, rather than trying to engage on separating
> historic-and-irrelevant scripts from
> historic-but-possibly-relevant ones, or accepting the group and
> relying on registry restrictions, we are hearing an argument
> that sounds suspiciously like:
> (i) We have identified this set of scripts as historic.
... or limited-use, or obsolete, or otherwise inappropriate
for use in identifiers. This is not some single, monolithic
> (ii) Some historic scripts clearly don't belong in IDNs.
> (iii) Therefore all historic scripts should be banned
> from IDNs.
> In addition to the obvious logic flaw,
The faulty syllogism is of your construction, not ours.
We assert that Table 4 in UAX #31 is a useful list for
identifying scripts not appropriate for identifiers. As a starting
point that list is a good point to make a distinction between
PVALID and DISALLOWED characters in IDNA 2008, as well.
If people wish to argue that one or more candidates on
that list (such as Runic, or Coptic, or something else)
*is* useful in IDNs and should not be banned from them,
then I'd like to hear the case -- and if made, I would then
see no trouble in having such scripts also be made PVALID.
> I continue to believe
> that there are edge cases in Unicode's classifications of
> scripts as "historic" or "not historic" that would be disputed
> by other reasonable and competent people.
Then dispute the edge cases. I'm fine with that.
What I'm not fine with is making *everything* PVALID,
including scads of characters from scripts that clearly,
by everybody's assessment, including yours, are neither
needed nor appropriate for IDNs.
> I tried with my comments about "uncertain" to see if we could
> find a defining principle that would permit us to identify and
> handle the relevant scripts and leave the others aside.
> Probably it doesn't work. But I have a lot of problems with
> "not cuneiform, therefore not Runic" or "not Linear-B, therefore
> not Deseret".
Straw man argument. I have made no such inferences.
More information about the Idna-update