emoji (was Re: I-D Action: draft-klensin-idna-rfc5891bis-00.txt)

John C Klensin klensin at jck.com
Thu Mar 16 21:15:53 CET 2017

Just to respond to this one issue...

--On Monday, March 13, 2017 06:57 +0000 Shawn Steele
<Shawn.Steele at microsoft.com> wrote:

> I did not ignore the Unicode categories of Emoji.  I indicated
> that despite their classification, Unicode (not me) has been
> including new emoji characters in their updated tables.  The
> 2003 emoji did not surprise me, but I was surprised that they
> were extending it.


First, while I'm not sure how much difference it makes in
general, the smiling faces, etc., of earlier years were
"emoticons" (or just typographic symbols) and not emoji, which
are a newer invention and addition to Unicode.  While there
might have been other issues (and probably were), had the
Unicode Consortium been convinced that emoji were a new script
and kind of letter, they could easily have defined such a script
and, I beiieve, even a new "Letter" property (or assigned them
to "Lo") without any damage to stability rules or other
important principles.   Worst case, they could have coded new
forms of the emoticons into the emoji script, done something
appropriate with NFKC if they thought that was necessary, and
moved on.

They didn't.  Which brings me to...

Second, while UTR#46 allows emoji in Unicode Domain Names,
UAX#31 (Identifier and Pattern Syntax) does not allow them in
Unicode-recommended identifiers.  That creates an interesting
situation.  Certainly we looked at UAX#31 in designing IDNA2008.
While the results, in terms of what was and was not considered
acceptable for an identifier, were not identical -- the DNS has
some special needs and constraints which are the reason the
rules of IDNA2008 are not identical the the PEECIS
recommendations about more general-purpose identifiers either --
major areas of difference (i.e., beyond some special
considerations and edge cases) between IDNA2008 and UAX#31
should be surprising to all concerned. As far as I know, there
are, those cases and a difference in style aside, no significant
differences between UAX#31 and IDNA2008.   (The difference in
style is that UAX#31 defines equivalence rules for, e.g., case
and normalization while IDNA2008, in part in order to assure
that labels could be converted from U-label to A-label form and
back without loss of information. avoids that by imposing
restrictions on its inputs.

However, introduction of emoji in UTR#46, if used instead of
IDNA2008, creates the interesting situation in which strings
that are valid for use in domain names labels are not valid as
Unicode-recommended identifiers.   I trust you can see why I
think that is a problem even if we need to agree to disagree
about everything else.


More information about the Idna-update mailing list