Definitional Problem with U-Label and A-Label

Vint Cerf vint at google.com
Wed Nov 19 05:15:20 CET 2008


Mark,

thanks for this careful reading. I am sure John will take these  
specifics into account in the post IETF version

v

NOTE NEW BUSINESS ADDRESS AND PHONE
Vint Cerf
Google
1818 Library Street, Suite 400
Reston, VA 20190
202-370-5637
vint at google.com




On Nov 18, 2008, at 3:55 PM, Mark Davis wrote:

> I had a chance to review the documents again. There is good  
> progress; the split of the definitions is very helpful. There is a  
> tendency to always look for the remaining issues in the document,  
> so I want to thank John for all the work done on this. I'll respond  
> with some comments, with different subjects for easier tracking.
>
>
> Definitional Problem with U-Label and A-Label
>
> I believe (although am not 100% sure) that the intent is for both U- 
> Label and A-Label to only refer to *valid* possible labels under  
> the specifications of IDNA2008, but the text does not yet support  
> that consistently. Here is the breakdown. (I'm using D1.3 to mean  
> section 1.3 in Defs, and so on, with P for protocol, B for bidi, R  
> for rationale).
>
> LDH
> The following conditions:
> Must match http://tools.ietf.org/html/rfc952
> <name>  ::= <let>[*[<let-or-digit-or-hyphen>]<let-or-digit>]
> Length limited to 1..63 (http://tools.ietf.org/html/rfc1034, 3.1)
> Must not have hyphens in both positions 3 and 4. (new condition)
>
> Condition 3 is not stated in D2.3.1.2, but appears elsewhere.  
> Should be in Defs 2.3.1.2.
>
> A-Label
> I believe the definition should be the following conditions:
> ASCII string of length 5 to 64
> starts with "xn--" (or case variants thereof) [implicitly no hyphen  
> at end]
> the remainder  is valid punycode
> and the depunycoded result must be a valid U-Label
>
> I believe that the above is the intended definition, but it is not  
> fully supported by the text in Defs, except (perhaps) very  
> indirectly. Note that A-Label according to this is dependent on U- 
> Label. To make sure that we are not circular, we need to define U- 
> Label independently of A-Label.
>
> Putative A-Label
> Any string that is all ASCII, but is neither LDH or A-Label.
>
> U-Label
> This is difficult to make out. I believe the definition should be:
>
> contains at least one non-ASCII character.
> is in form NFC (P4.2)
> contains neither DISALLOWED nor UNASSIGNED (P4.3.1)
> no hyphens in both position 3 and 4 (P4.3.2.1) [implicitly no  
> hyphen at start or end]
> no leading combining marks (P4.3.2.2)
> obeys context constrains (P4.3.2.3)
> obeys bidi constraints (P4.3.2.4)
> converts to valid punycode of length < 60
>
> Protocol:
>
> 4.3.3 says the following:
>
>    Strings that have been produced by the steps above, and whose
>    contents pass the above tests, are U-labels.
>
> However, this may does not include condition 8 above; that is the  
> test for mapping to A-Label (eg overly long punycode) in 4.5, not  
> "above" 4.3.3. Condition #1 is also implicit.
>
> Defs:
>
> 2.3.1.1 says the following:
>
>       A "U-label" is an IDNA-valid string of Unicode characters,
>       including at least one non-ASCII character, expressed in a
>       standard Unicode Encoding Form -- in an Internet transmission
>       context this will normally be UTF-8 -- and subject to the
>       constraint below.
>
> This is inconsistent with 4.3.3, with the only constraints being  
> that U-Labels be NFC, be convertable to and from valid A-Labels,  
> and not be of the form xx--.. But the phrase in bullet 2 seems to  
> state that they must meet "all of the requirements of *these  
> specifications*". But it is not clear what those are: they should  
> be listed precisely.
>
> I can understand not wanting to complicate Defs by having  
> conditions 1-8 spelled out completely. It would be possible to  
> handle this without complicating Defs, *if* the specific sections  
> corresponding to the conditions were explicitly referenced in Defs.
>
> Putative U-Label
> Any Unicode string that contains at least one non-ASCII character,  
> but is not a U-Label.
>
> I can suggest some text fixes, if that would be helpful, but wanted  
> to get the principles right first.
>
> Mark
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20081118/aa03aaa2/attachment.htm 


More information about the Idna-update mailing list