Hyphen Restrictions

Wed Jan 5 07:28:34 CET 2011

On Wed, Jan 05, 2011 at 03:18:20PM +0900, Yoshiro YONEYA wrote:
> Or does it mean third and fourth character from the beginning of the string?
> For example:
>   beginning of the string
>     |
>     v 1   2   3   4   5 <-- position of character
>     +---+---+---+---+---+
>     |<A>|<B>| - | - |<C>| here <A>, <B> and <C> stands for non-ASCII (multi- 
>     +---+---+---+---+---+ octets) character
>               ^   ^
>               |   |
>       two consecutive hyphens

I believe the intention is this one.  The target is "the Unicode
string".  At one point in the development of IDNA2008, I think this
was called a "putative U-label", if I recall correctly.  The idea was
that you had an inbound Unicode string that was supposed to be a
U-label, but you didn't know yet.

That this is the correct interpretation is suggested by section 4.4,
which talks about converting the whole thing to an A-label by doing
the Punycode conversion.  That suggests that previous "labels" in 4.x
were only ever putative U-labels or else they were A-labels.

The above is merely my interpretation; I hold no special authority.

A
-- 
Andrew Sullivan
ajs at shinkuro.com
Shinkuro, Inc.