CONTEXTO Proposal
Mark Davis ⌛
mark at macchiato.com
Mon Jul 20 22:09:06 CEST 2009
I believe that none of the current CONTEXTO characters are really required
to be CONTEXTO, and all should be simply PVALID. I'd like to ask for a
consensus call on this. There is a copy at
http://www.macchiato.com/unicode/idna/exceptions/contexto-proposal in case
emailers make this less readable.
Here are my recommendations in detail:
A. HYPHEN The rule on HYPHEN is unnecessary, since it is a requirement of
the DNS system anyway; this is completely redundant. So remove from Tables:
- http://tools.ietf.org/html/draft-ietf-idnabis-tables#appendix-A.1
Appendix A.1. HYPHEN-MINUS
Code point:
U+002D
Overview:
Must not appear at the beginning or end of a label.
Lookup:
False
Rule Set:
True;
If FirstChar .eq. cp Then False;
If LastChar .eq. cp Then False;
*Note that in Protocol we have:
* 4.2.3.1. Consecutive Hyphens The Unicode string MUST NOT contain "--" (two
consecutive hyphens) in
the third and fourth character positions.
*The corresponding Lookup restriction is missing from Protocol and should be
added.
* **B. ARABIC-INDIC DIGITS The rule on these overlaps with Bidi, and would
be simpler and more appropriately moved there, since these are all only of
concern with Bidi processing. So remove:
- http://tools.ietf.org/html/draft-ietf-idnabis-tables#appendix-A.13
- http://tools.ietf.org/html/draft-ietf-idnabis-tables#appendix-A.14
And change BIDI from:
5. If an EN is present, no AN may be present, and vice versa.
To:
5. If an EN is present, no AN or EXTENDED ARABIC-INDIC digit
(U+0660..U+0669) may be present. If an AN is present, no EXTENDED
ARABIC-INDIC digit (U+0660..U+0669) may be present.
C. Other CONTEXTO
Remove the other CONTEXTO cases. These consist of only the following 6
characters, and there is no particular problem with making them all simply
PVALID in the Exceptions table.
*Code
* * * *Name
* *Motivation?
* *My Comments
* U+00B7 <http://unicode.org/cldr/utility/character.jsp?a=00B7> · MIDDLE
DOT a·b.us vs
a.b.us
but different placement
U+0375 <http://unicode.org/cldr/utility/character.jsp?a=0375> ͵ GREEK
LOWER NUMERAL SIGN a͵b.us vs
a,b.us but , is illegal anyway
U+05F3<http://unicode.org/cldr/utility/character.jsp?a=05F3>
׳ HEBREW PUNCTUATION GERESH a׳b.us vs
a'b.us but ' is illegal anyway
U+05F4 <http://unicode.org/cldr/utility/character.jsp?a=05F4> ״ HEBREW
PUNCTUATION GERSHAYIM a״b.us vs
a"b.us but " is illegal anyway
U+30FB<http://unicode.org/cldr/utility/character.jsp?a=30FB>
・ KATAKANA MIDDLE DOT a・b.us vs
a.b.us but different placement, width
U+0483<http://unicode.org/cldr/utility/character.jsp?a=0483>
҃ COMBINING CYRILLIC TITLO a҃b .us unclear motivation; but no worse than
other combining marks. If there is any problem with that, just make it
DISALLOWED in the Exception section; it is archaic anyway.
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090720/e2cf4d0f/attachment.htm
More information about the Idna-update
mailing list