LANGUAGE SUBTAG REGISTRATION FORM: pinyin

John Cowan cowan at ccil.org
Tue Aug 5 17:35:21 CEST 2008


Tracey, Niall scripsit:

> The point of "zh" is that a text written in Chinese logograms is
> not necessarily Mandarin. As I understand it, there are many Chinese
> languages that share a mutually comprehensible written mode -- it's
> pretty much impossible to point to a Chinese text and identify it
> unambiguously as Mandarin, unless the writer uses a lot of slang or
> colloquial idioms.

Whole books have been written on how wrong this is, so what follows is
an oversimplification.

Until about 100 years ago, the modern Sinitic languages were not
written at all.  There was only wenyan (Classical Chinese), a highly
compact written-only "language" which had evolved from a variety of
Chinese spoken about 2500 years earlier.  One may loosely compare the
situation to a Europe in which a tiny minority can read and write Latin,
but everyone including that minority is completely illiterate in their
home languages -- English, French, German, Italian, etc.  (The Sinitic
languages are about as similar as the Romance languages, but wenyan
writing spread far beyond the Sinitic-speaking area.)  So the Sinitic
languages shared a "mutually comprehensible written mode" only in the
sense that the written mode had little to do with any of them.

At the turn of the 20th century, the characters of wenyan were repurposed
(including the invention of some new ones) to write modern Mandarin
directly.  This had occasionally been done before, but was looked down on.
Since then, this written Mandarin (called baihua 'plain language' as
distinct from wenyan) has become the standard written language of China,
with occasional bits of embedded wenyan -- the more elevated the style,
the more such bits are used.  So although there is a continuum from pure
baihua to pure wenyan, it is usually perfectly straightforward to look
at a text and say if it is written Mandarin or not.

A similar process can be, and has been, applied to the other Sinitic
languages to create written forms for them as well.  Unlike baihua,
these have never been standardized, and all Chinese governments have
strongly discouraged them as tending to fragment the single written
Mandarin standard.  In particular, the morphemes in those languages which
have neither wenyan nor baihua counterparts wind up being written in
a variety of ways.  An analogy would be writing Swiss German -- it is
mostly not done at all (Standard German is written instead), and when
it must be done, there is no one orthography that all recognize.

Given all these facts, it was reasonable for the people who created
the ISO 639-1 and 639-2 lists, who were concerned almost entirely with
written languages, to have only a single code for all Chinese forms.
The bulk of all documents would be baihua, with an important minority of
wenyan ones and a negligible number of others.  The ISO 639-3 group,
who were concerned with spoken languages, created a code for each
Sinitic language without regard to how, or whether, they were written.
The resulting mismatch is what we continue to struggle with.

-- 
John Cowan       http://www.ccil.org/~cowan        <cowan at ccil.org>
        You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
        You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
                Clear all so!  `Tis a Jute.... (Finnegans Wake 16.5)


More information about the Ietf-languages mailing list