CONTEXTO Proposal

Mark Davis ⌛ mark at macchiato.com
Mon Jul 20 22:09:06 CEST 2009


I believe that none of the current CONTEXTO characters are really required
to be CONTEXTO, and all should be simply PVALID. I'd like to ask for a
consensus call on this. There is a copy at
http://www.macchiato.com/unicode/idna/exceptions/contexto-proposal in case
emailers make this less readable.

Here are my recommendations in detail:
A. HYPHEN The rule on HYPHEN is unnecessary, since it is a requirement of
the DNS system anyway; this is completely redundant. So remove from Tables:

   - http://tools.ietf.org/html/draft-ietf-idnabis-tables#appendix-A.1

Appendix A.1.  HYPHEN-MINUS
   Code point:
      U+002D
   Overview:
      Must not appear at the beginning or end of a label.
   Lookup:
      False
   Rule Set:
      True;
      If FirstChar .eq. cp Then False;
      If LastChar .eq. cp Then False;


*Note that in Protocol we have:
* 4.2.3.1. Consecutive Hyphens The Unicode string MUST NOT contain "--" (two
consecutive hyphens) in
the third and fourth character positions.

*The corresponding Lookup restriction is missing from Protocol and should be
added.
* **B. ARABIC-INDIC DIGITS The rule on these overlaps with Bidi, and would
be simpler and more appropriately moved there, since these are all only of
concern with Bidi processing. So remove:

   - http://tools.ietf.org/html/draft-ietf-idnabis-tables#appendix-A.13
   - http://tools.ietf.org/html/draft-ietf-idnabis-tables#appendix-A.14

And change BIDI from:

5. If an EN is present, no AN may be present, and vice versa.

To:

5. If an EN is present, no AN or EXTENDED ARABIC-INDIC digit
(U+0660..U+0669) may be present. If an AN is present, no EXTENDED
ARABIC-INDIC digit (U+0660..U+0669) may be present.
 C. Other CONTEXTO

Remove the other CONTEXTO cases. These consist of only the following 6
characters, and there is no particular problem with making them all simply
PVALID in the Exceptions table.
  *Code
* * * *Name
* *Motivation?
* *My Comments
*  U+00B7 <http://unicode.org/cldr/utility/character.jsp?a=00B7>  ·  MIDDLE
DOT a·b.us vs
a.b.us
 but different placement
  U+0375 <http://unicode.org/cldr/utility/character.jsp?a=0375>  ͵  GREEK
LOWER NUMERAL SIGN a͵b.us vs
a,b.us but , is illegal anyway
U+05F3<http://unicode.org/cldr/utility/character.jsp?a=05F3>
 ‎׳‎  HEBREW PUNCTUATION GERESH a‎׳‎b.us vs
a'b.us‎ but ' is illegal anyway
  U+05F4 <http://unicode.org/cldr/utility/character.jsp?a=05F4>  ‎״‎  HEBREW
PUNCTUATION GERSHAYIM a״b.us vs
a"b.us‎ but " is illegal anyway
U+30FB<http://unicode.org/cldr/utility/character.jsp?a=30FB>
 ・  KATAKANA MIDDLE DOT a・b.us vs
a.b.us but different placement, width
U+0483<http://unicode.org/cldr/utility/character.jsp?a=0483>
 ҃  COMBINING CYRILLIC TITLO a҃b .us unclear motivation; but no worse than
other combining marks. If there is any problem with that, just make it
DISALLOWED in the Exception section; it is archaic anyway.






Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090720/e2cf4d0f/attachment.htm 


More information about the Idna-update mailing list