"This case isn't the important one" (was Re: Visually confusable characters (8))

Cary Karp cary at karp.org
Tue Aug 12 10:37:56 CEST 2014


Quoting Kent Karlsson:

> Not sure I should get into this hot air... But...

+1

> On a slightly different point, in the Latin script: Andrew wrote "in
> the case of (e.g.) ö (in Swedish) and o-umlaut (in German). They're
> clearly different letters linguistically too."
> 
> How?

The answer to that depends on the significance one ascribes to the
following:

The German alphabet includes ö as an umlauted form of o, the former
sorting directly after the latter. It can be represented alternatively
as oe in a variety of contexts but never as o. In the Swedish alphabet ö
is an atomic letter -- the last in a 29-letter alphabet and sorted
accordingly -- and does not correctly decompose to oe (see below).

> Nit: "Faeltroem" is a major typo in German as well, even though that 
> *fallback* seems to be more common in German than for Swedish (where
> it has been used, huh, back in "pure ASCII" times, or when some
> people use a keyboard without the "local" letters).

The need for a fallback Swedish form was formally recognized in the 2nd
ed. of "Svenska skrivregler" (Swedish Orthographic Rules), in 2000. It
prescribed dropping the diacritical marks from any character bearing
them, in any context where the proper form cannot be represented, rather
than replacing the atomic letter with a digraph: å > a, ä > a, ö > o,
and not å > aa, ä > ae, ö > oe. The digraph fallback was, indeed, in
common use at the time and the rule was introduced to counter it.

There's a bit of bibliographic glue between this digression and the
overarching present rubric. The Swedish rule articulating the
single-character fallback explicitly notes "Internet addresses" as a
context where it might be required and continues, "Letters beyond a-z
will likely be available for use in Internet addresses soon enough."
(The 3rd edition, 2008, describes IDNs as a matter of course but notes
the remaining constraint in the local part of e-mail addresses.)

/Cary


More information about the Idna-update mailing list