[OT] Client display of languages (was: Re: What rules have been used...)

John C Klensin klensin at jck.com
Sat Dec 16 16:51:06 CET 2006



--On Friday, 15 December, 2006 15:37 -0800 Kenneth Whistler
<kenw at sybase.com> wrote:

> 
>> > Most email clients have a user interface in a single
>> > language. That would be the default language to use for IDN
>> > display. Some clients may even allow the user to specify
>> > multiple languages, as browsers do.
>> 
>> I find it very convenient that my email client displays
>> Russian spam in  Russian characters. I can instantly
>> recognize that as Russian, and know  that the sender has no
>> interest in communicating with me.
> 
> But do you mean as mojibake Russian (i.e. KOI-8 misinterpreted
> and displayed as Latin-1 letters in your email client) or
> actually as Russian?

Ken, I can't speak for Harald, but, for me, it is Russian,
displayed in Cyrillic characters.

> It seems to me that in either case, if you don't read Russian,
> you would immediately know that the sender has no interest
> in communicating with you.

While that would be a logical assumption, it is not always the
case.  The discussions leading up to the MIME design included
treatment of this situation at some length, with great attention
to the edge cases.    It is precisely these cases that create
interesting problems and opportunities for Internet-based cases
that may or may not exist in more conventional communications.  

I'm going to use Chinese as an example below because I read no
Chinese at all.  I can still read enough Russian to create an
extra case or two.  So, assume I receive a message in Chinese.
It is possible that...

(i) the sender had no interest in communicating with me.

(ii) the sender believed that I could more easily obtain an
accurate translation from Chinese to English than she could
obtain an accurate translation before sending.

(iii) the sender made a mistake and accidentally transmitted the
Chinese rather than the English.

(iv) the sender didn't know that I am an ignorant barbarian who
does not read Chinese or was trying to flatter me by assuming
that I _could_ read Chinese.

Note that, in any of the last three cases, the sender has a
presumptive desire to communicate with me.  It is then my
problem to figure out whether I'm sufficiently interested in
communication with the sender to extract the document and go
through a translation process to read the note and, potentially,
to respond to it in Chinese.

And there is machinery in MIME that, if properly implemented in
an MUA, would permit me to extract Chinese text to a file for
handoff to a translator, even if I can't render (much less read)
that text on my own machine.

With Russian (possibly unlike Harald), my problem is worse.
There was a time when I was fairly competent in the language and
could read it at about the same speed at which I could read
English.  But I haven't used the language intensely or regularly
for more than 40 years and it is pretty much gone.   So, if
someone sends me a note in Russian, I have to use other clues to
determine whether the other party intends to communicate and, if
so, whether I want to go to the trouble to translate the message
and how much of it to try to translate (e.g., if the first
sentences contain references to games of chance or good
opportunities to make money, the odds that I will look at the
rest of the message are slight.. even more slight than they
would be with an English-text message, and I don't read those.
But a subject line or author might get me to skim a first
sentence and that sentence might, in principle, send me off
looking for a dictionary or a friend.

We still have analogies in postal mail.  It is reasonably likely
that, if I receive what is obviously a mass advertising or
solicitation message in CJK characters, it will go straight to
the trash.   But, if something arrives that appears to be sent
to me as an individual with content I might care about, I'm
likely to go looking for translation assistance.

These are important capabilities to preserve, even if Harald,
and the vast majority of others, see Cyrillic and conclude
"doesn't want to communicate with me and I don't want to try to
communicate with him".  Note that, in the pre-spam-scourge
period when we assumed that almost every note was a serious
attempt to communicate, even Harald might have responded to
several notes in Russian from the same apparent source by
finding a friend to translate "I don't read Russian, if you want
to get a message to me, you must send it in English or
Norwegian" into Russian and put it into a file that could be
sent back, possibly automatically.

>> I prefer that form of display. I may not be the only one.
> 
> I certainly use the fact that my 8859-1 email client can't
> display Russian or Chinese or Korean to make it easy to delete
> lots of spam in a hurry that makes it through corporate
> filters. But now I've started getting lots of Turkish spam as
> well, and that arrives almost legible, even though I don't
> read Turkish. Ah well. ;-)

And while some of us a now using Unicode-enabled mail clients,
rather than those that are stuck in 8859-1, the difference helps
identify the very tricky nature of this business.  Prior to the
broadening use of UTF-8 in email, if one was willing to treat
anything in Russian as trash, filtering that would discard
messages that specified KOI-8 or ISO 8859-5 was fairly easy (one
could also do it on "language=", but the language parameter is
optional; for anything non-ASCII, "charset=" is not, for obvious
reasons.  But now, that parameter often just says "UTF-8".
Whatever its other advantages (which are considerable), it
provides no filtering clues.  For filtering clues, one has to
examine enough characters of the text to make an inference about
scripts.  That almost certainly produces better-quality
filtering, but it is a _lot_ more complicated in terms of the
burdens on spam filters and MUAs.

> P.S. I *do* read Chinese, but I'm pretty sure that some fluid
> engineering firm in Shanghai doesn't *really* want to
> communicate with me -- they just want lots of people to buy
> their services.

Right, but this puts you in the same position I'm in with
Russian although presumably with greater competency and fluency.
Because someone might want to communicate with you in Chinese,
you need to read far enough into the message to determine that
is coming from some "fluid engineering firm" in Shanghai" before
discarding it as trash.  Harald and I presumably do not.

regards,
   john



More information about the Idna-update mailing list