emoji (was Re: I-D Action: draft-klensin-idna-rfc5891bis-00.txt)

Shawn Steele Shawn.Steele at microsoft.com
Sat Mar 18 03:18:05 CET 2017

I just got an escalated mail from a customer suggesting that if Windows didn't start supporting emoji IDN better, then they would need to move to a different platform.

Any more restrictive desires are going to have a difficult time overcoming those forces.

-----Original Message-----
From: Patrik Fältström [mailto:paf at frobbit.se] 
Sent: Friday, March 17, 2017 12:56 AM
To: Shawn Steele <Shawn.Steele at microsoft.com>
Cc: John C Klensin <klensin at jck.com>; idna-update at alvestrand.no; Andrew Sullivan <ajs at anvilwalrusden.com>
Subject: Re: emoji (was Re: I-D Action: draft-klensin-idna-rfc5891bis-00.txt)

What the marketing side do not include in their "this is good stuff" is the risk the user of that friendly cool "thing" ends up being a false positive and destructive events. It is only included after the event happened.


On 16 Mar 2017, at 22:05, Shawn Steele wrote:

> A few of us were thinking maybe identifiers almost need (might be too strong a word) two layers... One to provide a fairly strict version of providing "identifiers" in a way that tries to reduce confusion, and another layer that provides "friendly labels" that help people get to those identifiers in a way that makes them feel good.
> Most (AFAICT) of the more interesting uses of IDN already resolve to a name that would fit in that "stricter identifier" bucket, so, in practice, we kinda already have two layers.  The "marketing thought it would be good if this linked to us" which goes to the "this is the label that doesn't scare the IT department."
> Regardless on where you draw the line of what characters are appropriate, this is already happening somewhat naturally, especially when it's "easy".  Eg: an umlauted domain resolving to a pure-ascii variant spelling.  Of course that's tougher for other languages, but the same idea seems to happen a lot.
> -Shawn
> -----Original Message-----
> From: John C Klensin [mailto:klensin at jck.com]
> Sent: Thursday, March 16, 2017 1:16 PM
> To: Shawn Steele <Shawn.Steele at microsoft.com>; Patrik Fältström 
> <paf at frobbit.se>
> Cc: idna-update at alvestrand.no; Andrew Sullivan 
> <ajs at anvilwalrusden.com>
> Subject: RE: emoji (was Re: I-D Action: 
> draft-klensin-idna-rfc5891bis-00.txt)
> Just to respond to this one issue...
> --On Monday, March 13, 2017 06:57 +0000 Shawn Steele <Shawn.Steele at microsoft.com> wrote:
>> ...
>> I did not ignore the Unicode categories of Emoji.  I indicated  that 
>> despite their classification, Unicode (not me) has been  including 
>> new emoji characters in their updated tables.  The
>> 2003 emoji did not surprise me, but I was surprised that they  were 
>> extending it.
> Shawn,
> First, while I'm not sure how much difference it makes in general, the smiling faces, etc., of earlier years were "emoticons" (or just typographic symbols) and not emoji, which are a newer invention and addition to Unicode.  While there might have been other issues (and probably were), had the Unicode Consortium been convinced that emoji were a new script and kind of letter, they could easily have defined such a script and, I beiieve, even a new "Letter" property (or assigned them to "Lo") without any damage to stability rules or other important principles.   Worst case, they could have coded new forms of the emoticons into the emoji script, done something appropriate with NFKC if they thought that was necessary, and moved on.
> They didn't.  Which brings me to...
> Second, while UTR#46 allows emoji in Unicode Domain Names,
> UAX#31 (Identifier and Pattern Syntax) does not allow them in Unicode-recommended identifiers.  That creates an interesting situation.  Certainly we looked at UAX#31 in designing IDNA2008.
> While the results, in terms of what was and was not considered acceptable for an identifier, were not identical -- the DNS has some special needs and constraints which are the reason the rules of IDNA2008 are not identical the the PEECIS recommendations about more general-purpose identifiers either -- major areas of difference (i.e., beyond some special considerations and edge cases) between IDNA2008 and UAX#31 should be surprising to all concerned. As far as I know, there are, those cases and a difference in style aside, no significant differences between UAX#31 and IDNA2008.   (The difference in style is that UAX#31 defines equivalence rules for, e.g., case and normalization while IDNA2008, in part in order to assure that labels could be converted from U-label to A-label form and back without loss of information. avoids that by imposing restrictions on its inputs.
> However, introduction of emoji in UTR#46, if used instead of IDNA2008, creates the interesting situation in which strings that are valid for use in domain names labels are not valid as Unicode-recommended identifiers.   I trust you can see why I think that is a problem even if we need to agree to disagree about everything else.
> best,
>     john
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update

More information about the Idna-update mailing list