MVALID (was Re: M-Label or MVALID, and dangers with mappings?)

Mark Davis mark at macchiato.com
Sat Apr 11 18:14:03 CEST 2009


We are thinking along very similar lines. Yes, I think what we want to do is
have the definition of MVALID as those characters that are subject to
IDNA2003-style mapping. I think it is best to call it a slightly different
name, since it is those characters subject to mapping, and we don't want
people to think it is all those characters valid in an M-Label. I'll use the
working name MSUBJECT.
The process in Protocol would be along the following lines.

1. For any substring of the input whose characters are all in MSUBJECT,
convert that substring via the following mapping, and replace in the source.

substring = toNFKC(removeDI(toCaseFold(toNFKC(substring))))

// the "removeDI step would be dropped if we decide not to remove them

2. Transform the entire string via NFC.

// we need to do this to make sure the result is NFC, because of possible
interactions between characters that are inside and outside MSUBJECT.

3. Proceed with the rest of Protocol


We would then define MSUBJECT in Tables using much the same process as you
have for PVALID, via combinations of properties and exceptions.

So, for example, if we have uppercase O-umlaut in MSUBJECT, it will be
converted to lowercase, if we don't, then it wouldn't.


Mark


2009/4/10 Patrik Fältström <patrik at frobbit.se>
...

> 2. M-Label
>
> I have seen the discussions regarding M-Label, and my say that as an editor
> of the tables document, I think it might be "more interesting" to define
> MVALID as a property that is calculated in the tables document.
>
> Suggestion:
>
> MVALID would be a codepoint that is mapped, according to the standardized
> mapping function, to something that is not DISALLOWED.
>
> Next question would then be, what does the mapping function look like?
>
> I have seen sort of the following suggestions, but I might have
> misunderstood this, so please help/my excuses for missing something:
>
> 1. Casefold (C+F)
> 2. Lowercase (C+S)
> 3. "IDNA2003 valid _input_ codepoints that are not mapped to DISALLOWED and
> themselves DISALLOWED"
>
> Questions:
>
> A. Should we defined MVALID?
> B. What should the general rule be that define it (we can have codepoints
> in Exceptions as before)?
>
>   Patrik
>
>
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20090411/38bf6c2e/attachment.htm 


More information about the Idna-update mailing list