[OT] Client display of languages (was: Re: What rules have been used...)

Kenneth Whistler kenw at sybase.com
Tue Dec 19 03:48:00 CET 2006


John,

> > It seems to me that in either case, if you don't read Russian,
> > you would immediately know that the sender has no interest
> > in communicating with you.
> 
> While that would be a logical assumption, it is not always the
> case.  The discussions leading up to the MIME design included
> treatment of this situation at some length, with great attention
> to the edge cases.    It is precisely these cases that create
> interesting problems and opportunities for Internet-based cases
> that may or may not exist in more conventional communications.

O.k.
  
> 
> I'm going to use Chinese as an example below because I read no
> Chinese at all.  I can still read enough Russian to create an
> extra case or two.  So, assume I receive a message in Chinese.
> It is possible that...
> 
> (i) the sender had no interest in communicating with me.
> 
> (ii) the sender believed that I could more easily obtain an
> accurate translation from Chinese to English than she could
> obtain an accurate translation before sending.
> 
> (iii) the sender made a mistake and accidentally transmitted the
> Chinese rather than the English.
> 
> (iv) the sender didn't know that I am an ignorant barbarian who
> does not read Chinese or was trying to flatter me by assuming
> that I _could_ read Chinese.
> 
> Note that, in any of the last three cases, the sender has a
> presumptive desire to communicate with me.

Or might not. You don't actually know. Because there are all
kinds of other scenarios.

(v) the sender made a mistake and intended to communicate
    with someone *else*, but mistyped the address
    
(vi) the sender assumed you were an ignorant barbarian who does
    not read Chinese and was trying to insult you by rubbing
    your ignorance in your face
    
(vii) ... etc.

> It is then my
> problem to figure out whether I'm sufficiently interested in
> communication with the sender to extract the document and go
> through a translation process to read the note and, potentially,
> to respond to it in Chinese.

Sure. And everybody familiar with email now, with the possible
exception of the rankest newbs, knows that there is a 99.99%
chance that the email is merely spam.

> So, if
> someone sends me a note in Russian, I have to use other clues to
> determine whether the other party intends to communicate and, if
> so, whether I want to go to the trouble to translate the message
> and how much of it to try to translate (e.g., if the first
> sentences contain references to games of chance or good
> opportunities to make money, the odds that I will look at the
> rest of the message are slight.. even more slight than they
> would be with an English-text message, and I don't read those.
> But a subject line or author might get me to skim a first
> sentence and that sentence might, in principle, send me off
> looking for a dictionary or a friend.

Yeah. But if they *really* intended to communicate,
they would know to put a subject line:

Subject: Personal for Klensin (get Russian translator for content)

Of course, nobody can guarantee anything here, and you *might*
get almost anything from someone.


> These are important capabilities to preserve,

I don't think anybody is arguing that, actually. Nothing
in either the Unicode Standard nor the protocols would
prevent any of these scenarios.

What a end user might want to have available as an option
in an user agent is quite another thing. For example, I
would be quite content to have an option in my email
agent that said GB2312-->trash, because even though I *can*
read some Chinese, and even though I know Chinese who can and
do communicate with me, and even though they *know* I can
read some Chinese, they would email me in English and
put the Chinese someplace to look at instead.

> even if Harald,
> and the vast majority of others, see Cyrillic and conclude
> "doesn't want to communicate with me and I don't want to try to
> communicate with him".  Note that, in the pre-spam-scourge
> period when we assumed that almost every note was a serious
> attempt to communicate, even Harald might have responded to
> several notes in Russian from the same apparent source by
> finding a friend to translate "I don't read Russian, if you want
> to get a message to me, you must send it in English or
> Norwegian" into Russian and put it into a file that could be
> sent back, possibly automatically.

Unfortunately, we are in the full-blown post-spam-scourge
period, and the assumptions, by everyone, are completely 
different now.

> Prior to the
> broadening use of UTF-8 in email, if one was willing to treat
> anything in Russian as trash, filtering that would discard
> messages that specified KOI-8 or ISO 8859-5 was fairly easy (one
> could also do it on "language=", but the language parameter is
> optional; for anything non-ASCII, "charset=" is not, for obvious
> reasons.  But now, that parameter often just says "UTF-8".
> Whatever its other advantages (which are considerable), it
> provides no filtering clues.  For filtering clues, one has to
> examine enough characters of the text to make an inference about
> scripts.  That almost certainly produces better-quality
> filtering, but it is a _lot_ more complicated in terms of the
> burdens on spam filters and MUAs.

Well, more complicated, yes. But the heuristics needed to
filter out on a per script basis are pretty easy, actually,
and would do a decent job if available. For example, I
definitely *don't* want a language filter on top of Latin,
because of the nature of the legitimate material I do get
in email.


> Right, but this puts you in the same position I'm in with
> Russian although presumably with greater competency and fluency.
> Because someone might want to communicate with you in Chinese,
> you need to read far enough into the message to determine that
> is coming from some "fluid engineering firm" in Shanghai" before
> discarding it as trash.

Actually, I assure you I do not, either. >devnull

Oh, actually, maybe the general manager at the Daihatsu
dealership in Istanbul who just sent me his personal Christmas
greetings and nice pictures of all his cars for sale really
does want to communicate with me. I guess I should go get
a Turkish translator to find out. kardaihatsu.com, I kid
you not. ;-)

Regards,

--Ken

>  Harald and I presumably do not.



More information about the Idna-update mailing list