<span>Vint</span> and I had a chance recently to meet and go over some of the IDNA definitions. The following attempts to capture that:<br><br><a href="http://www.macchiato.com/unicode/idna/label-categorization" target="_blank">http://www.macchiato.com/unicode/idna/label-categorization</a><br>
<br>I'm including a copy below.<br><br><h3 id="goog-ws-page-title-header" class="goog-ws-page-title" style="">
<span id="goog-ws-page-title" dir="ltr">Label Categorization</span>
</h3>
<div>The following are a set of
non-overlapping categorization of all labels of characters from
[\-A-Za-z09], with examples. It is an elaboration of the distinctions
made in <span style="font-style: normal;"><i><a href="http://tools.ietf.org/html/draft-ietf-idnabis-defs" rel="nofollow"><u> <b>defs</b></u></a></i>.<br></span><br>
<table border="1" cellpadding="3" cellspacing="0" width="100%">
<tbody>
<tr>
<td width="1%"><br></td><td width="1%"><b>Label Term<br>
</b></td>
<td width="15%"><b>Pattern<br>
</b></td>
<td width="35%"><b>Definition</b></td><td width="15%"><b>Examples<br>
</b></td>
</tr>
<tr>
<td width="1%">1<br></td><td width="1%"><b><i>
A-Label<br></i></b>
</td>
<td width="15%">
xn--*<br>
</td>
<td width="35%">
The * is valid punycode, passes IDN tests</td><td width="15%">
xn--bcker-gra ("bäcker")
</td>
</tr>
<tr>
<td width="1%">2<br></td><td width="1%"><b><i>
Fails-IDN5<br></i></b>
</td>
<td width="15%">
xn--*<br>
</td>
<td width="35%"><p>
The * is valid punycode <= 59 long, fails IDN Domain Name Lookup Protocol (Sec <a href="http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-5" rel="nofollow">5</a>)<br></p></td><td width="15%">
xn--g6h ("♥")<br>
xn--bcker-gra ("Bäcker")<br>
</td>
</tr>
<tr>
<td width="1%">3<br></td><td width="1%"><b><i>Fails-IDN4-only<br></i></b>
</td>
<td width="15%">xn--*<br>
</td>
<td width="35%"><p>The * is valid punycode <= 59 long, fails IDN Registration Protocol (Sec <a href="http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-4" rel="nofollow">4</a>)<b><i> but not </i></b><i>Domain Name Lookup (Sec <a href="http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-5" rel="nofollow">5</a>)</i></p>
</td><td width="15%">xn-a-0hc ("aא")</td>
</tr>
<tr>
<td width="1%">4<br></td><td width="1%"><b><i>
Overlong Punycode<br></i></b>
</td>
<td width="15%">
xn--*<br>
</td>
<td width="35%">
The * is valid punycode but 60 bytes or more (invalid DNS).</td><td width="15%">
xn--o39a20gda89ku8a4mt2wnra67lzvaw9qrno41a245bf6am0w14sdib7zvppbz309c6da<br>
("가낗나뇲다댯라럈마먔ᄇ뱟사샷악얐ᄌ쟛차챴카컀)
</td>
</tr>
<tr>
<td width="1%">5<br></td><td width="1%"><b><i>
Invalid PunyCode<br></i></b>
</td>
<td width="15%">
xn--*<br>
</td>
<td width="35%">
The * is invalid Punycode.</td><td width="15%">
xn--a<br>xn--<br>
</td>
</tr>
<tr>
<td width="1%">6<br></td><td width="1%"><b><i>
Invalid ACE Prefix<br></i></b>
</td>
<td width="15%">
!x*--*<br>
*!n--*<br>
!x!n--*<br>
</td>
<td width="35%">
The pattern has hyphens in position 3&4, but doesn't start with "xn"</td><td width="15%">
ab--g6h<br>
</td>
</tr>
<tr>
<td width="1%">7<br></td><td width="1%"><b><i>
Valid LDH<br></i></b>
</td>
<td width="15%"><p><a href="http://tools.ietf.org/html/rfc952" rel="nofollow">RFC 952</a></p><p> except above</p>
</td>
<td width="35%">
length < 64,...<br></td><td width="15%">
abc<br>
</td>
</tr>
<tr><td width="1%">8<br></td><td width="1%"><b><i>Other ASCII<br></i></b></td><td width="15%">all but above<br></td><td width="35%"><br></td><td width="15%">$a3&<br></td></tr></tbody>
</table>
</div>
<br>
<p>Names for various subgroupings are also useful. For example, Terms
1-5 are all "putative A-Labels" or "ACE Prefix" labels. Terms 4-6 could
be called "Broken IDN". Terms 2-6 could be called "Invalid IDN".</p><h3><a name="TOC-Relation-between-Unicode-and-Punico"></a>
Relation between Unicode and Punicode</h3>
All Unicode strings are mapped (reversibly) by Punycode to one of the following (adding the ACE prefix):<br><br>
<ul><li>
A-Label</li><li>
Fails-IDN5</li><li>Fails-IDN4-only</li><li>Overlong Punycode</li></ul><br>Thus for each of 1-4 there is a corresponding Unicode String (Label):
<ol><li>U-Label</li><li>Unicode-Fails-IDN5</li><li>Unicode-Fails-IDN4-only</li><li>Overlong-Unicode.<br>
</li></ol>
<br>Note that apparent Punycode strings might not map to Unicode, such as the "a" in "xn--a".<br>
<h2><a name="TOC-Inconsistency-in-current-defs"></a>Inconsistency in current <span style="font-style: normal;"><i><a href="http://tools.ietf.org/html/draft-ietf-idnabis-defs" rel="nofollow"><u><b>defs</b></u></a></i></span><br>
</h2>
<p>
The term "LDH label" is defined in:</p><p><br></p>
<div style="margin-left: 40px;"><b>2.3.1.2. LDH-label and Internationalized Label</b></div><pre style="margin-left: 40px;"><font size="2"> These specifications use the term "LDH-label" strictly to refer to an<br>
all-ASCII label that obeys the preferred syntax (often known as<br> "hostname" (from <a href="http://tools.ietf.org/html/rfc952" rel="nofollow">RFC 952</a> [<a href="http://tools.ietf.org/html/rfc0952" title=""DoD Internet host table specification"" rel="nofollow">RFC0952</a>]) or "LDH") conventions <span style="background-color: rgb(255, 229, 153);">and that is</span><br style="background-color: rgb(255, 229, 153);">
<span style="background-color: rgb(255, 229, 153);"> not an IDN</span>.</font></pre>
<p>
That implies LDH = any valid LDH that is not an A-Label. In the diagram
below, however, it shows LDH-Label as being neither an A-Label <i><b>nor</b> Broken IDN.</i><br>
</p>
<pre><br><font size="1"><span style="font-family: Courier New;"> _______________________ _______________________</span><br style="font-family: Courier New;"><span style="font-family: Courier New;"> | ASCII Labels | | Non-ASCII |</span><br style="font-family: Courier New;">
<span style="font-family: Courier New;"> | | | |</span><br style="font-family: Courier New;"><span style="font-family: Courier New;"> | ___________________| | __________________|</span><br style="font-family: Courier New;">
<span style="font-family: Courier New;"> | |LDH-conforming (1)| | | U-label (2) |</span><br style="font-family: Courier New;"><span style="font-family: Courier New;"> | | | | |_________________|</span><br style="font-family: Courier New;">
<span style="font-family: Courier New;"> | | ________________| | | |</span><br style="font-family: Courier New;"><span style="font-family: Courier New;"> | | | <b>LDH-label</b> | | | Binary Label |</span><br style="font-family: Courier New;">
<span style="font-family: Courier New;"> | | |_______________| | | (including |</span><br style="font-family: Courier New;"><span style="font-family: Courier New;"> | | | <b>A-label </b> | | | high bit on) |</span><br style="font-family: Courier New;">
<span style="font-family: Courier New;"> | | |_______________| | |_________________|</span><br style="font-family: Courier New;"><span style="font-family: Courier New;"> | | | | | | |</span><br style="font-family: Courier New;">
<span style="font-family: Courier New;"> | | | <b>Broken IDN</b> | | | Bit String |</span><br style="font-family: Courier New;"><span style="font-family: Courier New;"> | | | e.g., xn--?,| | | Label |</span><br style="font-family: Courier New;">
<span style="font-family: Courier New;"> | | | abc--def | | |_________________|</span><br style="font-family: Courier New;"><span style="font-family: Courier New;"> | | |_______________| |______________________|</span><br style="font-family: Courier New;">
<span style="font-family: Courier New;"> | |__________________|</span><br style="font-family: Courier New;"><br style="font-family: Courier New;"></font></pre><h2><a name="TOC-Inconsistency-in-protocol"></a>
Inconsistency in <a href="http://tools.ietf.org/html/draft-ietf-idnabis-protocol" rel="nofollow"><u> <b>protocol</b></u></a></h2>In
the following statement it says "U-Label". This is incorrect. The
application of sections 5.1-5.5 do not guarantee that the result is a
U-Label, since they do not require the application of BIDI or Context
rules. Similarly, we can't use the term "A-Label" (Sec 5.6, 5.7) since
the putative A-Label may not be one.<br><p style="margin-left: 40px;"><span><h3><a name="section-5.6">5.6</a>. Punycode Conversion<br></h3></span></p><pre style="margin-left: 40px;"><font size="2"> The validated string, a <span style="background-color: rgb(255, 229, 153);">U-label</span>, is converted to an A-label using the<br>
Punycode algorithm with the ACE prefix added.<br></font></pre><div style="margin-left: 40px;"><br><br></div><br><br><br clear="all">Mark<br>