idna-mapping update

John C Klensin klensin at jck.com
Tue Dec 22 20:17:43 CET 2009


--On Tuesday, December 22, 2009 15:33 +0000 Gervase Markham
<gerv at mozilla.org> wrote:

> On 22/12/09 03:50, John C Klensin wrote:
>> position you characterize as "why do we need these".  Of
>> course, that position gets even stronger when we can see
>> problems, as in the superscript and ligature examples I
>> discussed in my note to Michel.
> 
> It seems to me that an important factor would be keyboards. If
> there are no widely-used keyboards where pressing a letter key
> gives you a "fi" ligature, then it seems rather less important
> to have a mapping for it than if there are (e.g) 50 million
> Antarcticans out there who have a key for that and are very
> used to pressing it when they want to type the letters "f" and
> "i" one after another. The Principle of Least Surprise and all
> that.

Gerv,

While I agree that looking at [standardized] keyboards could be
very useful, I see the way you stated the question as
interesting.  The odds that a standardized keyboard would have
the "fi" ligature as a letter key when people intended the
letter "f" and "i" to be represented is actually pretty low --
at least from what I've observed sitting on the relevant
standards committees.  User-programmed shortcut keys are another
matter: there is no accounting for taste and, back when
keystrokes were interpreted by servers rather than the local
machine, I used to have lots of keys that would type out whole
words or commands and knew others who did too.

Even in more recent years, I've seen keyboards set up to treat
"www" or "www." as a composite character. 

But therein lies the problem with your "fi" ligature example.
In most cases, if folks have put that character on a keyboard,
it is because they think of it as "a character", just as surely
a the "ae" and "oe" ligatures, which have code points assigned
to them, are thought of by the populations that them on keyboard
as characters.  I've never explicitly heard the UTC explanation
as to why the "ae" and "oe" forms map to themselves but "fi" is
considered strictly as a compatibility character for "f"
followed by "i", but I'm almost positive that any area and
language that puts it on a keyboard as a single character
wouldn't find the explanation persuasive.   If we are trying to
be responsive to the needs of that hypothetical population, it
is at least as reasonable to have a discussion about making the
"fi" ligature PVALID as it is to recommend its decomposition
into "f" and "i".   Remember that "w" has its origins in a
ligature too and that no one has proposed treating it as a
compatibility character that should be mapped into "uu".    But
the reasons for not doing so are more rooted in sensitivity to
coding in ASCII and its predecessor rather than the needs of
populations who came along later and who might, hypothetically
use the "fi" ligature much more often than the "uu" one.

   john



More information about the Idna-update mailing list