[OT] Re: support of metadata

John C Klensin klensin at jck.com
Thu Sep 17 16:42:56 CEST 2009



--On Thursday, September 17, 2009 14:35 +0200 jean-michel
bernier de portzamparc <jmabdp at gmail.com> wrote:

> Dear John,
> there is seemingly a fundamental misunderstanding between
> multilinguists and you: we do _not_ care about natural
> languages. We are only interested in a complete support of
> Unicode that you did remove when lowercasing punycode entries,
> without providing a mechanism to restore the Unicode
> information on an end to end basis.

This may be just a vocabulary problem, but I suggest that there
is no such thing as "complete support of Unicode" as you are
using the term.  From one perspective, we are completely
supporting Unicode, although we are specifying a normalization
form (NFC) and a particular model of permitted characters
(different from the Unicode Consortium's model for identifiers,
but not more or less "complete".  Their identifier model, if I
recall, _requires_ the use of toCaseFold, which is even more
problematic for some of what you want to do than what IDNA2008
specifies.   

You are proposing a third model, one that uses a few characters
as special type of markup.  Leaving aside the many questions of
whether that is a good idea and how it would interoperate with
IDNA and/or the DNS, it is just a different profile of Unicode,
not more or less "complete" Unicode support.

My personal free advice is that, if I wanted to do such a thing
in an environment that might be intermixed with IDNA strings, I
would not try to make any subtle use of DISALLOWED letters or
characters that would be invalid or unusual in context.  I would
also avoid Private-Use characters unless you anticipate a very
controlled and closed environment (often referred to as a
"walled garden" in Internet contexts and probably the same as
Harald's use of "local intranet") -- not because there is a
Unicode prohibition against the use of such characters for
private use, but because there is always a risk of someone else
using the same code points for a different purpose, causing
confusion for users.  

I would, instead, either 

	(i) make a clean and unambiguous break from IDNA by
	selecting a different prefix or 
	
	(ii) look to the control codes, annotation characters,
	or tag characters discussed in Chapter 16 of The Unicode
	Standard (5.0) to clearly identify the need for
	higher-level protocol processing.

I believe the former would involve less chance of
interoperability problems should your strings "leak" into the
public Internet, but that may be just a matter of taste.
However, you, and then Harald, wrote...

>>  If I take 
>> the protocol and table document and just change these two
>> points and  call it IDNAPLUS, explaining how these two points
>> can be blocked in  order to only be IDNA conformant, I can
>> publish them as IDNAPLUS RFC?  This then should not be a
>> problem for you?

> Not a problem at all. What you do in your local intranet is
> entirely your business.

With the understanding that I speak for no one --this is just my
opinion at the moment-- if you intend the usual RFC publication
process as that term is used in the IETF and related contexts,
your getting such a document published would be much easier, and
much more likely to be accepted, if you could demonstrate
extremely clear differentiation from the Internet-standard
protocol work, e.g. that there was no possibility of confusion
that would lead to interoperability problems, even with partial
or rather sloppy implementations of either version of IDNA.
That, too, would probably be better accommodated with a prefix
change than by trying to do anything subtle or in-line.  

On the other hand, if what you intend is a document published in
the IDNAPLUS series, or a specific other series, then, as Harald
says, it is entirely your business.

    john



More information about the Idna-update mailing list