FW: LANGUAGE TAG REGISTRATION FORMS

Wed May 21 16:05:53 CEST 2003

Michael,

One also wonders who defines which script is the default. Azerbaijan's
Azeri (Latin), with about 6.9 million Azeri speakers, or Iran's (Arabic),
with about 16 million? (Data based on CIA's World Factbook 2002.)

Other questions come to mind: What's the default script for Kurdish? If
somebody proposes ku-Latn, ku-Arab, and ky-Cyrl, which one will you
reject because of being the default?

roozbeh

On Tue, 20 May 2003, Mark Davis wrote:

> "Matching" and "lookup" or "choice" are very different functions.
> 
> As you and many other people have said, the ISO code representing
> Serbian (for example) is *entirely* divorced from written form. That
> means that when I am *matching* languages, it matches any Serbian, no
> matter what script it is written in. When one is matching, there is NO
> notion of "default".
> 
> But that means that if I do want to match documents with only Serbian
> written in Cyrllic, and not all possible Serbian documents, then I
> need to have a separate code for that. Thus there are three distinct
> things
> 
> Serbian (meaning any Serbian, no matter how it is written)
> Serbian-Cyrillic (meaning only Serbian written in Cyrillic, not Latin,
> not Arabic, not Greek,...)
> Serbian-Latin (meaning only Serbian written in Latin, not Cyrillic,
> not Arabic, not Greek,...)
> 
> Think of it this way. Suppose in the Animal Shelter business there are
> a set of codes for pets. If I wanted to match all dogs, I would use a
> code for "dog". If I wanted to match specific breeds, I would use a
> code for "dog-German_Shepherd", "dog-Dachshund", "dog-Mastiff", etc.
> 
> Now it may be the case that the vast majority of dogs are German
> Shepherds. In that case, if someone purchases a dog just by specifying
> "dog", you might say: Hmm. The default is German Shepherd, that's what
> we give him. So in that case, a default is reasonable.goo
> 
> But if someone is searching the database of pets (e.g. a match
> operation), then you *don't* want to say that  "dog" =
> "dog-German_Shepherd". They will get the wrong results if they assume
> *any* default.
> 
> Does this make more sense to you?
> 
> Mark
> 
> =============================
> previous messages
> 
> > The RFC does not specify *any* script for, say, "az". That means
> that
> > in language matching, it will pick up *any* Azeri; Cyrillic, Latin,
> > Arabic, whatever. If you want to be able to select out only Cyrillic
> > Azeri, then there has to be a code for that.
> >
> > For resource lookup, it makes sense for an ISO-639 code to have a
> > "default" script. But for language matching, one of the principal
> > functions of the RFC, you need to have both the "overall" tag, plus
> > each of the variants.
> 
> > > Then we are screwed and there will be no end of duplicate
> referents,
> > > and I really dislike that. :-( I do not think we should have
> > > duplicate referents.
> >
> > Nobody is screwed (that I know of). These are not duplicates. Think
> > again in terms of matches:
> >
> > "az" matches all and only documents that are in Azeri, no matter how
> > they are written.
> > "az-latn" matches all and only documents that are in Azeri AND
> written
> > in Latin script
> > "az-cryl" matches all and only documents that are in Azeri AND
> written
> > in Cyrillic script
> >
> > The sets of documents matched by each of these is distinct. These
> are
> > *not* duplicates: the sets matched by each of these is different.
> They
> > *are* different entities.
> >
> > Remember that there are *multiple* functions of RFC 3066 codes:
> >
> >    "This document describes a language tag for use in cases where it
> > is
> >    desired to indicate the language used in an information object,
> how
> >    to register values for use in this language tag, and a construct
> > for
> >    matching such language tags."
> >
> > As I said before, defaults only make sense -- or are needed! -- when
> > one is *accessing* data (producing a single result), not when one is
> > *matching* (producing multiple possible results). Both are valid
> > functions for the RFC. Thus it would be perfectly reasonable to
> > document that the default for "yi" is "Hebr" when accessing, but
> > "yi-Hebr" does mean something different than "yi", because "yi-Hebr"
> > excludes those documents that are Yiddish but written in language,
> and
> > "yi" doesn't. Farshtey?
> 
> 
> Märk Davis
> ________
> mark.davis at jtcsv.com
> IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
> (408) 256-3148
> fax: (408) 256-0799
> 
> ----- Original Message ----- 
> From: "Michael Everson" <everson at evertype.com>
> To: <ietf-languages at iana.org>
> Sent: Tuesday, May 20, 2003 01:08
> Subject: RE: FW: LANGUAGE TAG REGISTRATION FORMS
> 
> 
> > At 13:42 -0700 2003-05-01, Addison Phillips [wM] wrote:
> >
> > >And I guess I have a question here: there are *nine* proposals on
> the table.
> > >Have some of these "passed over the bar"? If so, which ones and
> why? If
> > >none, why not?
> >
> > So far I consider Serbian-in-Latin and Azeri-in-Arabic to be OK, so
> > long as Mark supplies references which are OK. (Roozbeh might help
> > with the latter.)
> >
> > Serbian-in-Cyrillic seems to me to be the same as Yiddish-in-Hebrew,
> > i.e. the default.
> >
> > Also Mark please do not attach files, but format them in the mail
> > message, in ASCII and not UTF-8, since that's what we always use.
> > Otherwise I have to do all the reformatting myself, and it's better
> > if your submissions are the same as everyone else's.
> > -- 
> > Michael Everson * * Everson Typography *  * http://www.evertype.com
> > _______________________________________________
> > Ietf-languages mailing list
> > Ietf-languages at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/ietf-languages
> >
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
> 

-- 
Roozbeh Pournader               | Sometimes I forget to reply to emails.
Sharif University of Technology | Some other times I don't find the time.
roozbeh <at> sharif <dot> edu   | So kindly remind me if it's important,
http://sina.sharif.edu/~roozbeh | and use other methods if it's urgent.