New version: draft-ietf-idna-tables-01.txt
kenw at sybase.com
Mon May 5 23:57:15 CEST 2008
> Minor nit, you lost 10FFFE and 10FFFF at the very end.
Yes, that's a bug in the table generation.
> Are those
> CJK COMPATIBILIBITY code points as you want them, some PVALID,
> some DISALLOWED ?
That is correct as is. Note that 12 characters in the
CJK Compatibility Ideographs block -- for historic reasons --
are actually unified ideographs. Those 12 are PVALID, not
DISALLOWED. All *actual* CJK compatility ideographs are
unstable under normalization, and hence are DISALLOWED.
> Generally, I don't see why reviving Linear B for the purpose of
> domain labels thousands of years since it was used is a good
> idea. You also can't do much with the one PVALID PHAISTOS DISC
> COMBINING OBLIQUE STROKE if the rest of the Phaistos Disc block
> is DISALLOWED. Similar Lycian, Carian, Old Italic, Gothic, Old
> Persian, Cypriot Syllables, Phoenician, Lydian, Cuneiform, ...
I concur with this opinion. I think it is silly to
have historic scripts on Plane 1, like Sumero-Akkadian Cuneiform,
PVALID for IDNs, and disagree with the apparent consensus that
set in two weeks ago to remove the block exclusion for
> Is there no chance to kill the complete 0Exxxx plane ? Or at
> least u+0E0000..u+0E0FFF as in I-D.duerst-iri-bis-02, you have
> only the bare minimum, assigned tag and variation characters,
> as DISALLOWED.
> Starting with unassigned => UNASSIGNED might be good to find all
> PVALID and CONTEXT? later, but it is a poor way to get as much
> DISALLOWED as possible. (Of course you know this already, it's
> just that I don't like the outcome, who is going to implement
> this huge table in small devices ?)
The table isn't huge if implemented correctly.
>From the point of view of an implementation dealing with
this table for any given version of Unicode, it all boils
down to a binary decision:
Is a character allowed (PVALID, CONTEXTO, CONTEXTJ) or
not (UNASSIGNED, DISALLOWED)?
That's a binary property in practice, with many fewer
value transitions than the 5-valued property, and can be
packed down to a *much* smaller table, while still being
very fast to access.
As for PVALID versus CONTEXT (O or J), that is a single
switch statement on a few values. It really
*should* just be U+200C and U+200D, as I don't think CONTEXTO
makes any sense, but whatever....
Frank is right that the table is bigger than it should be,
however. Adding back the historic scripts on Plane 1 just
increases the range needed for testing, to no appropriate
end. Under my earlier drafts of candidate properties
like IDN_Allowed (and Patrik's earlier draft tables),
*nothing* from Plane 1 was PVALID -- and I think that is
a useful design point, actually.
More information about the Idna-update