Mappings

Tue Jul 21 02:00:06 CEST 2009

> The "to what" is absolutely unambiguous.  If you think
> otherwise, please point out the relevant text and, if possible,
> suggest changes.  The idea is that the registrant needs to
> understand what she is getting registered, and what she is
> getting is the U-label (or, if you prefer IDNA2003 terminology,
> the string recovered by applying ToUnicode to what is actually
> stored in the DNS). 

Sorry, you are right it is completely unambiguous, it's not so much the 'to what' as the 'how' that I was seeking clarification on (which you have addressed further down)

> Now, what goes on between the registrar and the user to identify
> the relevant U-label is, well, between the registrar and the
> user.  My personal recommendation is that registrars stay as
> close to expecting U-labels from users as possible, but I expect
> that "as possible" would include NFC mapping and possibly width
> mapping in cases where the width form that is PVALID is hard to
> type.   But, if a registrar wants to accept something that
> requires either the mappings of the mapping document, or NFKC,
> or something else, that is fine as long as the would-be
> registrant is presented with the U-label and asked to verify
> that is what is being asked for before anything goes off to the
> registry.   The documents don't say more about this because it
> raises issues of registrant interfaces and business
> relationships that are far out of the WG's scope... and on which
> registrars ought to be able to compete if the relevant
> registrar-registry agreements permit that (more business
> relationships).   

> Again, if the documents aren't clear, please make suggestions.
> But I think, from your notes and maybe Shawn's, that you may be
> confused about one bit of intent: the "real" label is the
> canonicalized form, i.e., what IDNA2008 refers to as a U-label
> or its A-label equivalent.  Either of those can be obtained from
> the other by a transformation that does not lose _any_
> information.  That is pretty important and, if I've correctly
> understood what the WG has said several times, we have agreement
> on it.  It also isn't much different from IDNA2003 except for
> terminology and clarity: only the A-label can be stored in a
> zone (under either version) and applying ToUnicode to a valid
> A-label yields what is now called a U-label.   

I 100% agree and understand that the 'real' label is the U-label that has a 1-1 corresponding relationship with an A-label, registries should only deal with U-labels/A-labels and the rationale for all of that makes sense.

So I guess what you are saying is that by stating that registries should only accept Labels in NFC form with protocol valid code points (PVALID or CONTEXTx) you are implicitly saying that someone (probably registrars) SHOULD apply NFC to any string before sending it to the registry, and then by virtue of the fact that all uppercase code points are not protocol valid, you are implying that 'someone' SHOULD lowercase / case fold names before sending them through, and then further implying that because all of this is done on registration, application developers SHOULD do the same thing before looking up names?

My concern is about when users cannot (or do not) input U-labels that we describe how to make a U-label. I understand that this is broad and thus possibly impractical, but if we assume the starting point is a Unicode string we should be able to describe something. We have algorithms that decide if a code point should be PVALID or not, surely the logic that was used in that instance would allow us to come up with a way those algorithms can be applied to turn a Unicode string into a U-label (I am not expecting that this will always be possible) I think the process described in the mapping document should be sufficient for most cases I.e. some form of case folding/lower case, width mapping, followed by NFC. I am concerned about the ordering though, on looking into it more the following concern comes up, regardless of where the mappings document places NFC, NFC will always need to be done as a last step anyway (as the registry expects the label to be in NFC form):

So the question to the Unicode experts is does:

toNFC(casefold(s)) == toNFC(casefold(toNFC(s)))

or in the case of lowercase

toNFC(lowercase(s)) == toNFC(lowercase(toNFC(s)))

I have to admit that you do now have me questioning my own position on mappings and whether that process (ie. The mappings) needs to be specified as a MUST as part of the lookup process, perhaps a SHOULD will be sufficient. I am going to think about this more...

> The registry requirement of the IDNA2008 spec is that you put
> only A-labels into zones (nothing new) and that you "register"
> only U-labels or A-labels (which, with adjusted terminology, is
> exactly how some registries read IDNA2003).  The change is only
> the explicit expectation that you not accept something that
> requires mapping for registration without performing the mapping
> and and verifying it with the registrar or registrant.  Whether
> you require registrars to provide you only with U-labels,
> A-labels, or pairs of them is another business matter. I know
> what advice I'd give in order to head off complex error and race
> conditions, but it isn't something this WG thinks it can or
> should reasonably standardize.

No problem with the above statement

> Conversely, if a registrar or registrant decides to submit only
> U-labels and A-labels, there is no issue about mapping (or much
> of anything else unless you are also applying variant
> processing, which is also outside the WG's scope).

Agreed

>   john