[Fwd]: Response to Mark's message]

John Cowan cowan at mercury.ccil.org
Thu Apr 10 08:37:43 CEST 2003


Jon Hanna scripsit:

> The first is that it allows information about scripts to be completely
> orthogonal to information about languages. It's easy to create unusual
> combinations (English in Cyrillic etc.). Unusual combinations aren't
> actually that unusual, if I was to write some Russian, Hebrew or Japanese
> inline with English text I would generally transliterate it into Latin
> script (especially since I don't actually know any of those languages).

Transliteration is one thing.  What we are attempting to handle here
is the case of languages which are, in practice and by their users,
written in more than one script.  The practical limit for this is about
three scripts, although in principle any language can be transliterated
into any script more or less lossily, even tough cases like English
transliterated into Han.

> The use of a script subtag makes this a multiple-inheritance hierarchy.
> en-Latn-IE can be considered a "child" of en-IE, en-Latn, en or Latn. Like
> someone porting C++ to Java en-Latn-IE squashes this multiple-inheritance
> flat. In particular the connection between en-Latn-IE and en-IE is no longer
> as clear as it was before.

If you look at my proposal, you can see that it's possible to unambiguously
parse any conforming tag (and I do recognize that not all tags will conform)
using a regular expression into the components language, script, nation.
Given that, it's easy to construct a higher-level matcher that will
decide how to match codes against one another such that en-IE matches
en-Latn because both are en.

> In particular while spoken language has been spoken of as some kind of bogey
> of late we do have a need to handle it correctly. The connection between
> en-Latn-IE and spoken en-IE is a lot stronger IMHO than that between
> en-Latn-IE and en-Latn-US.

I find that impossible to believe.

As a literate speaker of American English, I have never had more than the
occasional word stand between me and any Irish English text whatsoever,
whereas some of the remarks made by my Hiberno-English-speaking colleagues
defeat me entirely (until they repeat them more slowly), and I think it
quite certain that there are dialects of American English which they would
by no means understand without considerable effort and assistance (I
myself barely understand some of them).  The differences between American
and Irish phonology, syntax, and lexis are much larger than the comparatively
trivial differences in spelling.

-- 
John Cowan           http://www.ccil.org/~cowan              cowan at ccil.org
To say that Bilbo's breath was taken away is no description at all.  There
are no words left to express his staggerment, since Men changed the language
that they learned of elves in the days when all the world was wonderful.
        --_The Hobbit_


More information about the Ietf-languages mailing list