What rules have been used for the current list of codepoints?

Kenneth Whistler kenw at sybase.com
Wed Dec 13 17:45:01 CET 2006


Patrick,

> I understand there is confusing what rules have been used TODAY for  
> the list of codepoints.
> 
> These are the rules, the first that matches tell whether the  
> codepoint is ok to include or not.
> 
> 1. If block is "IPA Extensions", the codepoint is not ok
> 2. If the script is "Inherited", the codepoint is not ok
> 3. If the codepoint is [A-Z], the codepoint is ok
> 4. If the codepoint is [0-9], the codepoint is ok
> 5. If NFKC(cp) != cp, the codepoint is not ok
> 6. If lowercase(cp) != cp, the codepoint is not ok
> 7. If class is [Ll, Lo, Mn, Mc], the codepoint is ok
> 
> I have a suggestion that rule 7 should also include classes Lm and  
> Nd, but I have not included that.
> 
> Do I see a consensus on this list that I should also include Lm and  
> Nd? (Then rule 4 can be removed.)

I have no idea how consensus on this list is measured, but
*I* am absolutely sure that Lm and Nd need to be added. In
fact, using the formulation you are using here for rules,
the whole list of rules should be reconstructed as:

1. If class is [Ll, Lm, Lo, Mn, Mc, Nd], the code point is ok
2. If NFKC(cp) != cp, the code point is not ok
3. If lowercase(cp) != cp, the code point is not ok

And that is pretty much exactly what I stated in the November 30
contribution.

And the results from that are posted at:

http://www.unicode.org/~whistler/SPLlLoLmMnMcNdStableCaseNFKC.txt

> 
> I also have a suggestion that rule 2 above should be removed, that I  
> went one step too far in conclusions from earlier discussions.
> 
> Do I see a consensus on this list that I should remove rule 2?

Again I do not understand how you judge consensus, but stated
that way, yes, remove rule 2.

Furthermore, clearly you need to remove your rule 1.

--Ken


> 
> BTW, the URL to the latest document is http://stupid.domain.name/ 
> idnabis/table-latest.html.
> 
> Other changes you will see is:
> 
> (a) The list of rules (that you see above) will be included in the  
> document
> (b) The scripts will be in english alphabetical order
> 
>      Patrik
>   
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
> 



More information about the Idna-update mailing list