cowan at mercury.ccil.org
Wed Jun 16 20:04:02 CEST 2010
Milos Rancic scripsit:
> If I could dream for a little bit, a proper tag of, let's say, Serbian
> language could be for example:
> indo_european-slavic-balkan-synthetic-shtokavian-serbian and for
> Pidgin (a creole language, not a type of languages)
> or similar. Probably, with some numbers in one repository which would
> describe closeness inside of the hierarchical group or, better, with
> more descriptive explanations inside of the repository. But such
> notification is not supported by any system.
ISO 639-6 (which is still in draft, and is not part of the BCP
47 evolving standard) is based on exactly such a system. However,
explicitly putting such relationships into tags is not only politically
sensitive, but unstable from a strictly scientific viewpoint.
Indo-European relationships (except at the topmost level) are pretty
well accepted, but the same is not true in general around the world.
Even such a question as "How many top-level language families exist?"
does not have a definite answer.
> The problem with "macrolanguage" tag is its ambiguity. Is it (a)
> genetically related dialects; (b) genetically related standard
> languages; (c) genetically related language groups; (d) genetically
> relatively close languages; (e) genetically relatively close languages
> with the same cultural background; (f) ... with different cultural
> background; (g) ...?
A macrolanguage is a group of language varieties which are treated as
a single language for some purposes and as multiple languages for
other purposes. That definition is intentionally broad so that it
can cover many particular cases.
XQuery Blueberry DOM John Cowan
Entity parser dot-com cowan at ccil.org
Abstract schemata http://www.ccil.org/~cowan
Infoset Unicode BOM --Richard Tobin
More information about the Ietf-languages