label categorization
Mark Davis
mark at macchiato.com
Thu Jan 22 03:08:30 CET 2009
Vint and I had a chance recently to meet and go over some of the IDNA
definitions. The following attempts to capture that:
http://www.macchiato.com/unicode/idna/label-categorization
I'm including a copy below.
Label Categorization The following are a set of non-overlapping
categorization of all labels of characters from [\-A-Za-z09], with examples.
It is an elaboration of the distinctions made in *
defs<http://tools.ietf.org/html/draft-ietf-idnabis-defs>
*.
*Label Term
* *Pattern
* *Definition**Examples
* 1
* A-Label
* xn--*
The * is valid punycode, passes IDN tests xn--bcker-gra ("bäcker") 2
* Fails-IDN5
* xn--*
The * is valid punycode <= 59 long, fails IDN Domain Name Lookup Protocol
(Sec 5 <http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-5>
)
xn--g6h ("♥")
xn--bcker-gra ("Bäcker")
3
*Fails-IDN4-only
* xn--*
The * is valid punycode <= 59 long, fails IDN Registration Protocol
(Sec 4<http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-4>
)* but not **Domain Name Lookup (Sec
5<http://tools.ietf.org/html/draft-ietf-idnabis-protocol-08#section-5>
)*
xn-a-0hc ("aא") 4
* Overlong Punycode
* xn--*
The * is valid punycode but 60 bytes or more (invalid
DNS).xn--o39a20gda89ku8a4mt2wnra67lzvaw9qrno41a245bf6am
0w14sdib7zvppbz309c6da
("가낗나뇲다댯라럈마먔ᄇ뱟사샷악얐ᄌ쟛차챴카컀) 5
* Invalid PunyCode
* xn--*
The * is invalid Punycode. xn--a
xn--
6
* Invalid ACE Prefix
* !x*--*
*!n--*
!x!n--*
The pattern has hyphens in position 3&4, but doesn't start with "xn"ab--g6h
7
* Valid LDH
*
RFC 952 <http://tools.ietf.org/html/rfc952>
except above
length < 64,...
abc
8
*Other ASCII
*all but above
$a3&
Names for various subgroupings are also useful. For example, Terms 1-5 are
all "putative A-Labels" or "ACE Prefix" labels. Terms 4-6 could be called
"Broken IDN". Terms 2-6 could be called "Invalid IDN".
Relation between Unicode and Punicode All Unicode strings are mapped
(reversibly) by Punycode to one of the following (adding the ACE prefix):
- A-Label
- Fails-IDN5
- Fails-IDN4-only
- Overlong Punycode
Thus for each of 1-4 there is a corresponding Unicode String (Label):
1. U-Label
2. Unicode-Fails-IDN5
3. Unicode-Fails-IDN4-only
4. Overlong-Unicode.
Note that apparent Punycode strings might not map to Unicode, such as the
"a" in "xn--a".
Inconsistency in current
*defs<http://tools.ietf.org/html/draft-ietf-idnabis-defs>
*
The term "LDH label" is defined in:
*2.3.1.2. LDH-label and Internationalized Label*
These specifications use the term "LDH-label" strictly to refer to an
all-ASCII label that obeys the preferred syntax (often known as
"hostname" (from RFC 952 <http://tools.ietf.org/html/rfc952> [RFC0952
<http://tools.ietf.org/html/rfc0952>]) or "LDH") conventions and that
is
not an IDN.
That implies LDH = any valid LDH that is not an A-Label. In the diagram
below, however, it shows LDH-Label as being neither an A-Label *nor Broken
IDN.*
_______________________ _______________________
| ASCII Labels | | Non-ASCII |
| | | |
| ___________________| | __________________|
| |LDH-conforming (1)| | | U-label (2) |
| | | | |_________________|
| | ________________| | | |
| | | *LDH-label* | | | Binary Label |
| | |_______________| | | (including |
| | | *A-label * | | | high bit on) |
| | |_______________| | |_________________|
| | | | | | |
| | | *Broken IDN* | | | Bit String |
| | | e.g., xn--?,| | | Label |
| | | abc--def | | |_________________|
| | |_______________| |______________________|
| |__________________|
Inconsistency in *
protocol*<http://tools.ietf.org/html/draft-ietf-idnabis-protocol>In
the following statement it says "U-Label". This is incorrect. The
application of sections 5.1-5.5 do not guarantee that the result is a
U-Label, since they do not require the application of BIDI or Context rules.
Similarly, we can't use the term "A-Label" (Sec 5.6, 5.7) since the putative
A-Label may not be one.
5.6. Punycode Conversion
The validated string, a U-label, is converted to an A-label using the
Punycode algorithm with the ACE prefix added.
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090121/cbc029ce/attachment.htm
More information about the Idna-update
mailing list