Criteria for languages?

Wed Dec 2 00:43:06 CET 2009

Consider it in terms of putative character disunifications. If someone asked to have a character “Blort” encoded and we had no reason to suspect a connection to any already encoded character, then we’d probably treat that differently than if we knew that lots of content was already representing this as an already-encoded character “Flort”. When the decision is taken, we use the information available; if we had no knowledge of a connection with an already encoded character, shipped a new version of Unicode including “Blort” and then someone came along with additional info about the connection with “Flort”, that wouldn’t substantively change anything.

So, I guess your question, as it pertains to Walliserdeutsch in relation to “de” and “gsw”, is whether it was reasonable to expect anyone had used “de” or “gsw” to tag Walliserdeutsch content.

This rationale comes to mind: by a vast, vast margin, content tagged “de” is in Standard “High” German, and with a few exceptions most other Germanic varieties are not particularly developed in terms of literature. So, it isn’t unreasonable to assume that there is no significant use of “de” for those varieties _unless given evidence otherwise_.

A further principle supporting this rationale is that there can be definite _dis_advantages to using macrolanguage entities: they are useful so long as a certain distinction is not particularly interesting, but once that distinction becomes interesting then the macrolanguage becomes a burden. Consider, for instance, the inconvenience of having “lav” for the Latvian macrolanguage and also “lvs” for Standard Latvian / “ltg” for Latgalian. So, from this perspective, it seems to me like macrolanguage entities are things we would rather avoid whenever possible, implying that we create them only when effectively forced to. In the case of “lav”, established and documented usage in MARC may force us to change “lav” into a macrolanguage; but I don’t know of anything compelling us to do so for “de” or “gsw” as a result of coding Wallisertitsch.

Peter

From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Mark Davis ?
Sent: Tuesday, December 01, 2009 11:23 AM
To: Randy Presuhn
Cc: ietf-languages at iana.org
Subject: Re: Criteria for languages?

It is the lack of a uniform policy that bothers me more than the particular case. Anything that is Walliserdeutsch right now would be either tagged "de" (because that was all that was available until gsw was encoded) or "gsw". Let's take precisely your wording and apply it to that case.
If an application requires standard Swiss German and Walliserdeutsch to be treated as distinct
languages, then clearly it would need to use the new subtag to identify standard
Swiss German, since "gsw" would mean "any kind of Swiss German, including Walliserdeutsch".   This
is a natural consequence of our "no narrowing" rules - all of the data which is
currently precisely and accurately tagged as Swiss German would remain accurately
tagged, though most would no longer be precisely tagged.  (Data for which the
tagger was unable to make a determination whether it was Swiss German or Walliserdeutsch
would remain precisely tagged.)  The assumption is that it is better to introduce
a (potentially lingering) imprecision in the tagging of legacy data, rather than to
cause any once-accurate tags on legacy data to become incorrect.

If your reasoning is correct for Latvian, then it is also correct for Swiss German! If it is not correct for Swiss German, then it is not correct for Latvian.

Mark

On Tue, Dec 1, 2009 at 09:46, Randy Presuhn <randy_presuhn at mindspring.com<mailto:randy_presuhn at mindspring.com>> wrote:
Hi -

> From: "John Cowan" <cowan at ccil.org<mailto:cowan at ccil.org>>
> To: "Peter Constable" <petercon at microsoft.com<mailto:petercon at microsoft.com>>
> Cc: <ietf-languages at iana.org<mailto:ietf-languages at iana.org>>; "Doug Ewell" <doug at ewellic.org<mailto:doug at ewellic.org>>
> Sent: Tuesday, December 01, 2009 9:29 AM
> Subject: Re: Criteria for languages?
...

> Peter Constable scripsit:
>
> > If the denotation of "lav" were changed to explicitly exclude Latgalian
> > (which would be necessary if its scope is not set to macrolanguage),
> > then an unknown number of librarians will have broken data. It would
> > be irresponsible of the ISO 639-RA/JAC to do such a thing, IMO.
>
> Quite so.
>
> In that case, the issue for us is: do we recommend that people continue
> to use "lav" for Latvian proper, or that they adopt the new subtag?
If an application requires standard Latvian and Latgalian to be treated as distinct
languages, then clearly it would need to use the new subtag to identify standard
Latvian, since "lav" would mean "any kind of Latvian, including Latgalian".   This
is a natural consequence of our "no narrowing" rules - all of the data which is
currently precisely and accurately tagged as Latvian would remain accurately
tagged, though most would no longer be precisely tagged.  (Data for which the
tagger was unable to make a determination whether it was Latvian or Latgalian
would remain precisely tagged.)  The assumption is that it is better to introduce
a (potentially lingering) imprecision in the tagging of legacy data, rather than to
cause any once-accurate tags on legacy data to become incorrect.

Randy

_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no<mailto:Ietf-languages at alvestrand.no>
http://www.alvestrand.no/mailman/listinfo/ietf-languages

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20091201/40b69fec/attachment.htm