FW: LANGUAGE TAG REGISTRATION FORMS

Tue May 20 11:31:46 CEST 2003

"Matching" and "lookup" or "choice" are very different functions.

As you and many other people have said, the ISO code representing
Serbian (for example) is *entirely* divorced from written form. That
means that when I am *matching* languages, it matches any Serbian, no
matter what script it is written in. When one is matching, there is NO
notion of "default".

But that means that if I do want to match documents with only Serbian
written in Cyrllic, and not all possible Serbian documents, then I
need to have a separate code for that. Thus there are three distinct
things

Serbian (meaning any Serbian, no matter how it is written)
Serbian-Cyrillic (meaning only Serbian written in Cyrillic, not Latin,
not Arabic, not Greek,...)
Serbian-Latin (meaning only Serbian written in Latin, not Cyrillic,
not Arabic, not Greek,...)

Think of it this way. Suppose in the Animal Shelter business there are
a set of codes for pets. If I wanted to match all dogs, I would use a
code for "dog". If I wanted to match specific breeds, I would use a
code for "dog-German_Shepherd", "dog-Dachshund", "dog-Mastiff", etc.

Now it may be the case that the vast majority of dogs are German
Shepherds. In that case, if someone purchases a dog just by specifying
"dog", you might say: Hmm. The default is German Shepherd, that's what
we give him. So in that case, a default is reasonable.goo

But if someone is searching the database of pets (e.g. a match
operation), then you *don't* want to say that  "dog" =
"dog-German_Shepherd". They will get the wrong results if they assume
*any* default.

Does this make more sense to you?

Mark

=============================
previous messages

> The RFC does not specify *any* script for, say, "az". That means
that
> in language matching, it will pick up *any* Azeri; Cyrillic, Latin,
> Arabic, whatever. If you want to be able to select out only Cyrillic
> Azeri, then there has to be a code for that.
>
> For resource lookup, it makes sense for an ISO-639 code to have a
> "default" script. But for language matching, one of the principal
> functions of the RFC, you need to have both the "overall" tag, plus
> each of the variants.

> > Then we are screwed and there will be no end of duplicate
referents,
> > and I really dislike that. :-( I do not think we should have
> > duplicate referents.
>
> Nobody is screwed (that I know of). These are not duplicates. Think
> again in terms of matches:
>
> "az" matches all and only documents that are in Azeri, no matter how
> they are written.
> "az-latn" matches all and only documents that are in Azeri AND
written
> in Latin script
> "az-cryl" matches all and only documents that are in Azeri AND
written
> in Cyrillic script
>
> The sets of documents matched by each of these is distinct. These
are
> *not* duplicates: the sets matched by each of these is different.
They
> *are* different entities.
>
> Remember that there are *multiple* functions of RFC 3066 codes:
>
>    "This document describes a language tag for use in cases where it
> is
>    desired to indicate the language used in an information object,
how
>    to register values for use in this language tag, and a construct
> for
>    matching such language tags."
>
> As I said before, defaults only make sense -- or are needed! -- when
> one is *accessing* data (producing a single result), not when one is
> *matching* (producing multiple possible results). Both are valid
> functions for the RFC. Thus it would be perfectly reasonable to
> document that the default for "yi" is "Hebr" when accessing, but
> "yi-Hebr" does mean something different than "yi", because "yi-Hebr"
> excludes those documents that are Yiddish but written in language,
and
> "yi" doesn't. Farshtey?

Märk Davis
________
mark.davis at jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message ----- 
From: "Michael Everson" <everson at evertype.com>
To: <ietf-languages at iana.org>
Sent: Tuesday, May 20, 2003 01:08
Subject: RE: FW: LANGUAGE TAG REGISTRATION FORMS

> At 13:42 -0700 2003-05-01, Addison Phillips [wM] wrote:
>
> >And I guess I have a question here: there are *nine* proposals on
the table.
> >Have some of these "passed over the bar"? If so, which ones and
why? If
> >none, why not?
>
> So far I consider Serbian-in-Latin and Azeri-in-Arabic to be OK, so
> long as Mark supplies references which are OK. (Roozbeh might help
> with the latter.)
>
> Serbian-in-Cyrillic seems to me to be the same as Yiddish-in-Hebrew,
> i.e. the default.
>
> Also Mark please do not attach files, but format them in the mail
> message, in ASCII and not UTF-8, since that's what we always use.
> Otherwise I have to do all the reformatting myself, and it's better
> if your submissions are the same as everyone else's.
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>