U+303B VERTICAL IDEOGRAPHIC ITERATION MARK

Kenneth Whistler kenw at sybase.com
Thu Jul 16 00:17:15 CEST 2009


Vint asked:

> If we can possibly avoid char by char rules that would be very helpful  
> in dealing with updates to Unicode.

Of course.

> 
> I gather these characters don't quite fall into a category that would  
> permit algorithmic treatment?

No... not if you mean by that, are there already defined
Unicode character properties that could be added to the
derivations in idnabis-tables.txt to avoid having to
refer to a list of characters to distinguish these particular
vertical display forms from other characters.

That doesn't mean that there couldn't be -- there are over
100 formally defined Unicode character properties already,
and more are added with nearly every version of the standard.

But with over 100,000 characters in the standard, and with
many of them being strange riff-raff accumulated for
interoperability with previous character encoding standards,
some of which had rather bizarre models for representing
characters, it is basically hopeless to try to define
ahead of time all of the potentially conceivable character
properties that someone might want to take into account
to distinguish particular characters for all possible,
conceivable applications.

At a certain point you just deal with this ad hoc, create
your exception lists and move one, IMO.

As for the implicit worry here that the minute IDNA 2008 goes
out the door, the Unicode Standard will decide to add
one (or 6 or 16) more vertical kana or ideographic iteration
marks that would require reopening the exception list and
reworking the derivation of the table once again -- the
chances of that are somewhere between "Nil" and "Down in
the Noise". All of the characters in question are compatibility
characters for interoperating with legacy Japanese character
sets -- the vertical kana iteration marks were in Unicode 1.0
for JIS X 0208, and even the most recently added
character, the topic of this thread, U+303B VERTICAL IDEOGRAPHIC
ITERATION MARK, was added in Unicode 3.2, for JIS X 0213.
The chances of the Japanese standards body deciding to
revise those to add more such things (which don't actually
exist anyway) is next to zero at this point. And the chance
that the UTC would add more such things to the Unicode
Standard without a prior move by the Japanese standards
body is precisely zero.

Finally, in response to Mark's comment on this thread, I'll
revise my assessment slightly re U+303B.

for U+303B

  CONTEXTO   This is the worse outcome. The only thing to be
             said for it is that it is the current state
             of the table. It should be changed.
             
  PVALID     This is an acceptable outcome. U+303B is garbage in
             IDNs, but as Mark said, no worse garbage than
             lots of other PVALID characters.
             
  DISALLOWED   This is my preferred outcome.
  
And whichever the consensus is for U+303B, you should do
the same for U+3031..U+3035, since they are "the same
but more so" as U+303B.

--Ken





More information about the Idna-update mailing list