06FD and 06FE should be PVALID for Sindhi

Kenneth Whistler kenw at sybase.com
Tue Apr 1 21:28:19 CEST 2008


Following up on what Mark just indicated...

There is a somewhat comparable, but older case in
Unicode encoding history, for:

U+0E2F THAI CHARACTER PAIYANNOI

That was originally given the property General_Category=Po
(Punctuation, Other), based on its description as an
indication of ellipsis or abbreviation in Thai. That was how
things stood up through Unicode 2.1.5.

Then, based on feedback from Thai implementers and experts,
it became clear that paiyannoi was incorporated *in* words,
as an indicator that letters have been omitted in
long, commonly used words, but is considered a part of
those words. So starting in Unicode 2.1.8, the UTC
recategorized it to General_Category=Lo (Letter, Other), 
which it has been ever since.

That made a difference, of course, in the UTC recommendation
regarding the use of paiyannoi in identifiers.

For various "signs" like this, the UTC practice
has been not to assume from the start that they *should* be
in identifiers. But on a case by case basis, some few
of these edge case characters have been added to identifiers, as the
information about usage becomes available. What the UTC doesn't
ever do is take them *out* of identifiers again -- which
is why it needs to be cautious to start out with.

--Ken




More information about the Idna-update mailing list