Additonal prefixes (was: Re: Final Sigma (was: RE: Esszett, Final Sigma, ZWJ and ZWNJ))

John C Klensin klensin at jck.com
Thu Feb 26 19:30:25 CET 2009



--On Wednesday, February 25, 2009 23:32 -0500 Vint Cerf
<vint at google.com> wrote:

> Before we go down the path of introducing a collection of
> prefixes, I   think we have a lot to get done with the xn--
> version first.

Yes.  

But there are two more fundamental problems with additional
prefixes.  The first may or may not be significant, but the
other definitely is.

(1) Taking the language-specific prefix as an example, one
really cannot encode the language in the prefix --there are too
many languages and, in practice, too few prefixes.  Note that
"xn" was chosen by IANA out of a possible list, derived from
some tree-walking, that was much smaller than the entire range
of "aa" ... "zz".  So, if one wanted to do this, one would
actually have

	* A prefix that designed "language + ACE" encoding,
	rather than ACE encoding.
	
	* An encoding of the "language".  Given the discussions
	about the amount of localization that might be required
	by various applications, this would require a fairly
	long (multi-component) LTRU code or the equivalent or an
	encoding of it.  I haven't done the calculations, but I
	assume that would take up more than an octet or two,
	since it would still have to be a case-independent ASCII
	encoding.
	
	* An ACE for the actual name-string. 

We have already heard concerns about the DNS-imposed 59
character restriction on a Punycode-created ACE and the implied
restriction of significantly fewer characters in a U-label.  The
further reduction implied by the above might be considered very
problematic by some parties.

(2) Encoding language information in a label is certainly no
problem for a registry that is told the language, but consider a
users or applications that see a native-character string and
want to convert it to an ACE and look it up.   It must deduce
the language and locale information and do so very precisely, or
must know it from some external information.   Since, barring an
LTRU-sensitive matching mechanism, <prefix><US-English>abcd and
<prefix><UK-English>abcd would end up being different labels and
not matching, the deduction about the language would presumably
have to be very exact (more exact than my example use of
"UK-English" would imply).  That is hard, especially if we
continue to permit labels like "od-in-f147", for which it would
be very difficult to argue that they are part of some particular
language.

One could, of course, shift to identifiers in which the language
was explicit at the user level, e.g., "en-UK:od-in-f147" or
"fr-FR:morphin" and with which en-US:motherhood would not
compare equal to en-US:motherhood  but, the last I checked, the
whole reason for IDNs was improved user-friendliness across the
globe and requirements for identifiers of that sort don't look
user-friendly to me.

This is obviously different from the situations we would
encounter if we tried to _change_ the prefix, replacing "xn--"
with something else over time and via some transition process.
Not pleasant and probably not very efficient, but we have done
it once before are presumably could figure out how to do it
again.  But switching to a world in which many prefixes (or
prefix-languageTag pairs) are expected to coexist just does not
feel plausible to me from a user perspective.

      john




More information about the Idna-update mailing list