Additonal prefixes (was: Re: Final Sigma (was: RE: Esszett, Final Sigma, ZWJ and ZWNJ))

Thu Feb 26 20:05:04 CET 2009

Hi John and Vint,

I certainly agree that we should do as much as possible with xn--.
However, at the same time, I am sympathetic to the Greek concern about
tonos (and the German wish for Eszett). I was simply exploring the
xo-- idea.

Note that we may not need a new prefix for each language. For example,
xo-- could start out as a prefix for the Greek script only (if that
was the first script to have rules under the xo-- scheme). The Arabic
script might be the 2nd one to be added to the xo-- spec, presumably
to cope with ZWNJ and maybe even ZWJ.

However, I agree with you that it would be difficult for a piece of
software to decide which spec to use (xn-- vs xo--) when a user is
typing on a keyboard. We certainly don't want the user to type "xn--"
or "xo--". They should just be able to type characters from their own
language, as long as they are letters or digits.

So, I'm going to drop the idea. If someone else wants to run with it, go ahead.

Erik

On Thu, Feb 26, 2009 at 10:30 AM, John C Klensin <klensin at jck.com> wrote:
>
>
> --On Wednesday, February 25, 2009 23:32 -0500 Vint Cerf
> <vint at google.com> wrote:
>
>> Before we go down the path of introducing a collection of
>> prefixes, I   think we have a lot to get done with the xn--
>> version first.
>
> Yes.
>
> But there are two more fundamental problems with additional
> prefixes.  The first may or may not be significant, but the
> other definitely is.
>
> (1) Taking the language-specific prefix as an example, one
> really cannot encode the language in the prefix --there are too
> many languages and, in practice, too few prefixes.  Note that
> "xn" was chosen by IANA out of a possible list, derived from
> some tree-walking, that was much smaller than the entire range
> of "aa" ... "zz".  So, if one wanted to do this, one would
> actually have
>
>        * A prefix that designed "language + ACE" encoding,
>        rather than ACE encoding.
>
>        * An encoding of the "language".  Given the discussions
>        about the amount of localization that might be required
>        by various applications, this would require a fairly
>        long (multi-component) LTRU code or the equivalent or an
>        encoding of it.  I haven't done the calculations, but I
>        assume that would take up more than an octet or two,
>        since it would still have to be a case-independent ASCII
>        encoding.
>
>        * An ACE for the actual name-string.
>
> We have already heard concerns about the DNS-imposed 59
> character restriction on a Punycode-created ACE and the implied
> restriction of significantly fewer characters in a U-label.  The
> further reduction implied by the above might be considered very
> problematic by some parties.
>
> (2) Encoding language information in a label is certainly no
> problem for a registry that is told the language, but consider a
> users or applications that see a native-character string and
> want to convert it to an ACE and look it up.   It must deduce
> the language and locale information and do so very precisely, or
> must know it from some external information.   Since, barring an
> LTRU-sensitive matching mechanism, <prefix><US-English>abcd and
> <prefix><UK-English>abcd would end up being different labels and
> not matching, the deduction about the language would presumably
> have to be very exact (more exact than my example use of
> "UK-English" would imply).  That is hard, especially if we
> continue to permit labels like "od-in-f147", for which it would
> be very difficult to argue that they are part of some particular
> language.
>
> One could, of course, shift to identifiers in which the language
> was explicit at the user level, e.g., "en-UK:od-in-f147" or
> "fr-FR:morphin" and with which en-US:motherhood would not
> compare equal to en-US:motherhood  but, the last I checked, the
> whole reason for IDNs was improved user-friendliness across the
> globe and requirements for identifiers of that sort don't look
> user-friendly to me.
>
> This is obviously different from the situations we would
> encounter if we tried to _change_ the prefix, replacing "xn--"
> with something else over time and via some transition process.
> Not pleasant and probably not very efficient, but we have done
> it once before are presumably could figure out how to do it
> again.  But switching to a world in which many prefixes (or
> prefix-languageTag pairs) are expected to coexist just does not
> feel plausible to me from a user perspective.
>
>      john
>
>
>