[iucg] mappings-01 and the general procedure

Alessandro Vesely vesely at tana.it
Tue Jul 14 12:00:34 CEST 2009


John C Klensin wrote:
> For IDNs, there are the following possibilities (in theory at
> least) for Latin-based characters:
> 
> * "A" and "a" could match for strings that contained zero
> non-ASCII characters, but be different if even a single
> non-ASCII appeared in the string.   That would cause the worst
> sort of user astonishment, at least for users of most of the
> languages that use Latin-based scripts.
> 
> * "A" and "a" could match always, but "Á" and "á" would not
> match and neither would "Å" and "å".  That would be a very
> strange result globally, even though it might be desirable for
> French words (but not necessary French mnenomics).
> 
> * If they are all permitted, for some sense of "permitted", "A"
> would match "a", "Á" would match "á", "Å" would match "å",
> and so on.  

Since this is a theoretical enumeration, there's another two classes
of possibilities that may be worth mentioning, just for the sake of
completeness:

* Match all of "a", "A", "á", "Á", "å", "Å", etcetera. Possibly also
match "l" and "1", like old typewriters, so that, e.g., "paypal" and
"paypa1" match. A procedure could even do complicate phonetics processing.

* Define the mapping parametrically in _mapping.TLD, for each
IDN-enabled TLD. Even if this may answer what are TLDs good for, a
global mapping would still be needed for the tld label itself.




More information about the Idna-update mailing list