New version: draft-ietf-idna-tables-01.txt

Tue May 6 00:16:05 CEST 2008

I do not believe we had consensus on the historic scripts - just a  
discussion.

There seem to be more than ample ways to advertise the existence of  
texts using these scripts without the need to instantiate the scripts  
in DNS.

Vint

On May 5, 2008, at 5:57 PM, Kenneth Whistler wrote:

> Frank noted:
>
>> Minor nit, you lost 10FFFE and 10FFFF at the very end.
>
> Yes, that's a bug in the table generation.
>
>> Are those
>> CJK COMPATIBILIBITY code points as you want them, some PVALID,
>> some DISALLOWED ?
>
> That is correct as is. Note that 12 characters in the
> CJK Compatibility  Ideographs block -- for historic reasons --
> are actually unified ideographs. Those 12 are PVALID, not
> DISALLOWED. All *actual* CJK compatility ideographs are
> unstable under normalization, and hence are DISALLOWED.
>
>> Generally, I don't see why reviving Linear B for the purpose of
>> domain labels thousands of years since it was used is a good
>> idea.  You also can't do much with the one PVALID PHAISTOS DISC
>> COMBINING OBLIQUE STROKE if the rest of the Phaistos Disc block
>> is DISALLOWED.  Similar Lycian, Carian, Old Italic, Gothic, Old
>> Persian, Cypriot Syllables, Phoenician, Lydian, Cuneiform, ...
>
> I concur with this opinion. I think it is silly to
> have historic scripts on Plane 1, like Sumero-Akkadian Cuneiform,
> PVALID for IDNs, and disagree with the apparent consensus that
> set in two weeks ago to remove the block exclusion for
> historic scripts.
>
>> Is there no chance to kill the complete 0Exxxx plane ?  Or at
>> least u+0E0000..u+0E0FFF as in I-D.duerst-iri-bis-02, you have
>> only the bare minimum, assigned tag and variation characters,
>> as DISALLOWED.
>>
>> Starting with unassigned => UNASSIGNED might be good to find all
>> PVALID and CONTEXT? later, but it is a poor way to get as much
>> DISALLOWED as possible.  (Of course you know this already, it's
>> just that I don't like the outcome, who is going to implement
>> this huge table in small devices ?)
>
> The table isn't huge if implemented correctly.
>
>> From the point of view of an implementation dealing with
> this table for any given version of Unicode, it all boils
> down to a binary decision:
>
> Is a character allowed (PVALID, CONTEXTO, CONTEXTJ) or
> not (UNASSIGNED, DISALLOWED)?
>
> That's a binary property in practice, with many fewer
> value transitions than the 5-valued property, and can be
> packed down to a *much* smaller table, while still being
> very fast to access.
>
> As for PVALID versus CONTEXT (O or J), that is a single
> switch statement on a few values. It really
> *should* just be U+200C and U+200D, as I don't think CONTEXTO
> makes any sense, but whatever....
>
> Frank is right that the table is bigger than it should be,
> however. Adding back the historic scripts on Plane 1 just
> increases the range needed for testing, to no appropriate
> end. Under my earlier drafts of candidate properties
> like IDN_Allowed (and Patrik's earlier draft tables),
> *nothing* from Plane 1 was PVALID -- and I think that is
> a useful design point, actually.
>
> --Ken
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update