New version: draft-ietf-idna-tables-01.txt

Kenneth Whistler kenw at sybase.com
Mon May 5 23:57:15 CEST 2008


Frank noted:

> Minor nit, you lost 10FFFE and 10FFFF at the very end. 

Yes, that's a bug in the table generation.

> Are those
> CJK COMPATIBILIBITY code points as you want them, some PVALID,
> some DISALLOWED ?

That is correct as is. Note that 12 characters in the
CJK Compatibility  Ideographs block -- for historic reasons --
are actually unified ideographs. Those 12 are PVALID, not
DISALLOWED. All *actual* CJK compatility ideographs are
unstable under normalization, and hence are DISALLOWED.

> Generally, I don't see why reviving Linear B for the purpose of
> domain labels thousands of years since it was used is a good
> idea.  You also can't do much with the one PVALID PHAISTOS DISC
> COMBINING OBLIQUE STROKE if the rest of the Phaistos Disc block
> is DISALLOWED.  Similar Lycian, Carian, Old Italic, Gothic, Old
> Persian, Cypriot Syllables, Phoenician, Lydian, Cuneiform, ...

I concur with this opinion. I think it is silly to
have historic scripts on Plane 1, like Sumero-Akkadian Cuneiform,
PVALID for IDNs, and disagree with the apparent consensus that
set in two weeks ago to remove the block exclusion for
historic scripts.

> Is there no chance to kill the complete 0Exxxx plane ?  Or at
> least u+0E0000..u+0E0FFF as in I-D.duerst-iri-bis-02, you have
> only the bare minimum, assigned tag and variation characters,
> as DISALLOWED.
> 
> Starting with unassigned => UNASSIGNED might be good to find all
> PVALID and CONTEXT? later, but it is a poor way to get as much
> DISALLOWED as possible.  (Of course you know this already, it's 
> just that I don't like the outcome, who is going to implement
> this huge table in small devices ?)

The table isn't huge if implemented correctly.

>From the point of view of an implementation dealing with
this table for any given version of Unicode, it all boils
down to a binary decision:

Is a character allowed (PVALID, CONTEXTO, CONTEXTJ) or
not (UNASSIGNED, DISALLOWED)?

That's a binary property in practice, with many fewer
value transitions than the 5-valued property, and can be
packed down to a *much* smaller table, while still being
very fast to access.

As for PVALID versus CONTEXT (O or J), that is a single
switch statement on a few values. It really
*should* just be U+200C and U+200D, as I don't think CONTEXTO
makes any sense, but whatever....

Frank is right that the table is bigger than it should be,
however. Adding back the historic scripts on Plane 1 just
increases the range needed for testing, to no appropriate
end. Under my earlier drafts of candidate properties
like IDN_Allowed (and Patrik's earlier draft tables),
*nothing* from Plane 1 was PVALID -- and I think that is
a useful design point, actually.

--Ken



More information about the Idna-update mailing list