label categorization

Mark Davis mark at macchiato.com
Thu Jan 22 03:08:30 CET 2009


Vint and I had a chance recently to meet and go over some of the IDNA
definitions. The following attempts to capture that:

http://www.macchiato.com/unicode/idna/label-categorization

I'm including a copy below.

Label Categorization The following are a set of non-overlapping
categorization of all labels of characters from [\-A-Za-z09], with examples.
It is an elaboration of the distinctions made in *
defs<http://tools.ietf.org/html/draft-ietf-idnabis-defs>
*.


*Label Term
* *Pattern
* *Definition**Examples
*  1
* A-Label
*  xn--*
 The * is valid punycode, passes IDN tests xn--bcker-gra ("bäcker")  2
* Fails-IDN5
*  xn--*

The * is valid punycode <= 59 long, fails IDN Domain Name Lookup Protocol
(Sec 5 <http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-5>
)
xn--g6h ("♥")
xn--bcker-gra ("Bäcker")
  3
*Fails-IDN4-only
* xn--*

The * is valid punycode <= 59 long, fails IDN Registration Protocol
(Sec 4<http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-4>
)* but not **Domain Name Lookup (Sec
5<http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-5>
)*
xn-a-0hc ("aא")  4
* Overlong Punycode
*  xn--*
 The * is valid punycode but 60 bytes or more (invalid
DNS).xn--o39a20gda89ku8a4mt2wn​ra67lzvaw9qrno41a245bf6am
0w14sdib7zvppbz309c6da
("가낗나뇲다댯라럈마먔ᄇ뱟사샷악얐ᄌ쟛차챴카컀)  5
* Invalid PunyCode
*  xn--*
 The * is invalid Punycode. xn--a
xn--
  6
* Invalid ACE Prefix
*  !x*--*
*!n--*
!x!n--*
 The pattern has hyphens in position 3&4, but doesn't start with "xn"ab--g6h
  7
* Valid LDH
*

RFC 952 <http://tools.ietf.org/html/rfc952>

except above
 length < 64,...
abc
 8
*Other ASCII
*all but above

$a3&

Names for various subgroupings are also useful. For example, Terms 1-5 are
all "putative A-Labels" or "ACE Prefix" labels. Terms 4-6 could be called
"Broken IDN". Terms 2-6 could be called "Invalid IDN".
Relation between Unicode and Punicode All Unicode strings are mapped
(reversibly) by Punycode to one of the following (adding the ACE prefix):


   - A-Label
   - Fails-IDN5
   - Fails-IDN4-only
   - Overlong Punycode


Thus for each of 1-4 there is a corresponding Unicode String (Label):

   1. U-Label
   2. Unicode-Fails-IDN5
   3. Unicode-Fails-IDN4-only
   4. Overlong-Unicode.


Note that apparent Punycode strings might not map to Unicode, such as the
"a" in "xn--a".
Inconsistency in current
*defs<http://tools.ietf.org/html/draft-ietf-idnabis-defs>
*

The term "LDH label" is defined in:


*2.3.1.2. LDH-label and Internationalized Label*

 These specifications use the term "LDH-label" strictly to refer to an
 all-ASCII label that obeys the preferred syntax (often known as
 "hostname" (from RFC 952 <http://tools.ietf.org/html/rfc952> [RFC0952
<http://tools.ietf.org/html/rfc0952>]) or "LDH") conventions and that
is
 not an IDN.

 That implies LDH = any valid LDH that is not an A-Label. In the diagram
below, however, it shows LDH-Label as being neither an A-Label *nor Broken
IDN.*


 _______________________ _______________________
 | ASCII Labels | | Non-ASCII |
 | | | |
 | ___________________| | __________________|
 | |LDH-conforming (1)| | | U-label (2) |
 | | | | |_________________|
 | | ________________| | | |
 | | | *LDH-label* | | | Binary Label |
 | | |_______________| | | (including |
 | | | *A-label * | | | high bit on) |
 | | |_______________| | |_________________|
 | | | | | | |
 | | | *Broken IDN* | | | Bit String |
 | | | e.g., xn--?,| | | Label |
 | | | abc--def | | |_________________|
 | | |_______________| |______________________|
 | |__________________|

Inconsistency in *
protocol*<http://tools.ietf.org/html/draft-ietf-idnabis-protocol>In
the following statement it says "U-Label". This is incorrect. The
application of sections 5.1-5.5 do not guarantee that the result is a
U-Label, since they do not require the application of BIDI or Context rules.
Similarly, we can't use the term "A-Label" (Sec 5.6, 5.7) since the putative
A-Label may not be one.

5.6. Punycode Conversion

 The validated string, a U-label, is converted to an A-label using the
 Punycode algorithm with the ACE prefix added.






Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090121/cbc029ce/attachment.htm 


More information about the Idna-update mailing list