Hyphen Restrictions

Yoshiro YONEYA yoshiro.yoneya at jprs.co.jp
Wed Jan 5 07:18:20 CET 2011


Hi, all,

I need clarification of RFC5891 section 4.2.3.1, which says:

4.2.3.1.  Hyphen Restrictions

   The Unicode string MUST NOT contain "--" (two consecutive hyphens) in
   the third and fourth character positions and MUST NOT start or end
   with a "-" (hyphen).

My question is that what "the third and fourth character positions" means.
Does it mean third and fourth octet from the beginning of the string?
For example:
  beginning of the string
    |
    v 1   2   3   4   5 <-- position of octet
    +---+---+---+---+---+
    | a | b | - | - | c |
    +---+---+---+---+---+
              ^   ^
              |   |
      two consecutive hyphens

Or does it mean third and fourth character from the beginning of the string?
For example:
  beginning of the string
    |
    v 1   2   3   4   5 <-- position of character
    +---+---+---+---+---+
    |<A>|<B>| - | - |<C>| here <A>, <B> and <C> stands for non-ASCII (multi- 
    +---+---+---+---+---+ octets) character
              ^   ^
              |   |
      two consecutive hyphens

My understanding for this restrictions is to preserve future ACE prefix, 
so I expect the answer for my question is former one.  Is that right?

Regards,

-- 
Yoshiro YONEYA <yoshiro.yoneya at jprs.co.jp>



More information about the Idna-update mailing list