emoji (was Re: I-D Action: draft-klensin-idna-rfc5891bis-00.txt)

Asmus Freytag asmusf at ix.netcom.com
Fri Mar 17 11:14:58 CET 2017

On 3/16/2017 1:15 PM, John C Klensin wrote:
> Just to respond to this one issue...
> --On Monday, March 13, 2017 06:57 +0000 Shawn Steele
> <Shawn.Steele at microsoft.com> wrote:
>> ...
>> I did not ignore the Unicode categories of Emoji.  I indicated
>> that despite their classification, Unicode (not me) has been
>> including new emoji characters in their updated tables.  The
>> 2003 emoji did not surprise me, but I was surprised that they
>> were extending it.
> Shawn,
> First, while I'm not sure how much difference it makes in
> general, the smiling faces, etc., of earlier years were
> "emoticons" (or just typographic symbols) and not emoji, which
> are a newer invention and addition to Unicode.  While there
> might have been other issues (and probably were), had the
> Unicode Consortium been convinced that emoji were a new script
> and kind of letter, they could easily have defined such a script
> and, I beiieve, even a new "Letter" property (or assigned them
> to "Lo") without any damage to stability rules or other
> important principles.   Worst case, they could have coded new
> forms of the emoticons into the emoji script, done something
> appropriate with NFKC if they thought that was necessary, and
> moved on.
> They didn't.  Which brings me to...

Actually, .... no.

The emoji are clearly not "letters". The word emoji means "picture
character"; they share some similarities in classification with logographs
(signs for words), but the picture is in the foreground, not the word.

As pictorial representations, the closest classification is "symbol", but
that is arguing from the logic inherent in such classification in Unicode.

The use of emoji in text is somewhat different. They can be used entirely
on their own, but it remains common to use them with text, or alternating
with text statements. The surrounding text can be of any script, making
the emoji less something that embodies its own script, but more akin to
characters that have the "Common" script (are to be used with any script).

Despite the precursors (smile) they represent something that's effectively
novel, with novel and evolving usage conventions.

I'm not surprised to see that they manage, unlike many other additions 
to Unicode,
to create problems in extending existing classifications and usage rules.

While I argue here that Unicode had little choice in the way GC and script
values were assigned, I would equally argue that those GC and script values
do not do a good job of capturing the essence of these beasts in the best

I'm fully cognizant of many of the issues that these represent, but
I think that I like the more nuanced reply by Andrew better than any
argument that is based on the results of implementing such a blunt
instrument as the GC.


More information about the Idna-update mailing list