New version: draft-ietf-idna-tables-01.txt
vint at google.com
Tue May 6 00:16:05 CEST 2008
I do not believe we had consensus on the historic scripts - just a
There seem to be more than ample ways to advertise the existence of
texts using these scripts without the need to instantiate the scripts
On May 5, 2008, at 5:57 PM, Kenneth Whistler wrote:
> Frank noted:
>> Minor nit, you lost 10FFFE and 10FFFF at the very end.
> Yes, that's a bug in the table generation.
>> Are those
>> CJK COMPATIBILIBITY code points as you want them, some PVALID,
>> some DISALLOWED ?
> That is correct as is. Note that 12 characters in the
> CJK Compatibility Ideographs block -- for historic reasons --
> are actually unified ideographs. Those 12 are PVALID, not
> DISALLOWED. All *actual* CJK compatility ideographs are
> unstable under normalization, and hence are DISALLOWED.
>> Generally, I don't see why reviving Linear B for the purpose of
>> domain labels thousands of years since it was used is a good
>> idea. You also can't do much with the one PVALID PHAISTOS DISC
>> COMBINING OBLIQUE STROKE if the rest of the Phaistos Disc block
>> is DISALLOWED. Similar Lycian, Carian, Old Italic, Gothic, Old
>> Persian, Cypriot Syllables, Phoenician, Lydian, Cuneiform, ...
> I concur with this opinion. I think it is silly to
> have historic scripts on Plane 1, like Sumero-Akkadian Cuneiform,
> PVALID for IDNs, and disagree with the apparent consensus that
> set in two weeks ago to remove the block exclusion for
> historic scripts.
>> Is there no chance to kill the complete 0Exxxx plane ? Or at
>> least u+0E0000..u+0E0FFF as in I-D.duerst-iri-bis-02, you have
>> only the bare minimum, assigned tag and variation characters,
>> as DISALLOWED.
>> Starting with unassigned => UNASSIGNED might be good to find all
>> PVALID and CONTEXT? later, but it is a poor way to get as much
>> DISALLOWED as possible. (Of course you know this already, it's
>> just that I don't like the outcome, who is going to implement
>> this huge table in small devices ?)
> The table isn't huge if implemented correctly.
>> From the point of view of an implementation dealing with
> this table for any given version of Unicode, it all boils
> down to a binary decision:
> Is a character allowed (PVALID, CONTEXTO, CONTEXTJ) or
> not (UNASSIGNED, DISALLOWED)?
> That's a binary property in practice, with many fewer
> value transitions than the 5-valued property, and can be
> packed down to a *much* smaller table, while still being
> very fast to access.
> As for PVALID versus CONTEXT (O or J), that is a single
> switch statement on a few values. It really
> *should* just be U+200C and U+200D, as I don't think CONTEXTO
> makes any sense, but whatever....
> Frank is right that the table is bigger than it should be,
> however. Adding back the historic scripts on Plane 1 just
> increases the range needed for testing, to no appropriate
> end. Under my earlier drafts of candidate properties
> like IDN_Allowed (and Patrik's earlier draft tables),
> *nothing* from Plane 1 was PVALID -- and I think that is
> a useful design point, actually.
> Idna-update mailing list
> Idna-update at alvestrand.no
More information about the Idna-update