ID for language-invariant strings

Tue Mar 18 01:37:15 CET 2008

I think that would be a reasonable change.

Mark

On Mon, Mar 17, 2008 at 5:05 PM, Peter Constable <petercon at microsoft.com>
wrote:

>  It seems to me that changing from "no linguistic content" to "not
> applicable" isn't a huge degree of broadening, and broadening is not
> prohibited. So, if you wanted to push for broadening, that might be
> possible. But I think there should be some consensus here before taking it
> to the JAC.
>
>
>
> Peter
>
>
>
> *From:* ietf-languages-bounces at alvestrand.no [mailto:
> ietf-languages-bounces at alvestrand.no] *On Behalf Of *Peter Constable
> *Sent:* Monday, March 17, 2008 3:26 PM
> *To:* Karen_Broome at spe.sony.com
>
> *Cc:* ietf-languages at iana.org
> *Subject:* RE: ID for language-invariant strings
>
>
>
> Karen: I suggested "no linguistic content" on the understanding that the
> audio and subtitle streams were all tagged separately, and that it would be
> an audio stream about which was declared "no linguistic content", not the
> film as a whole.
>
>
>
>
>
> Peter
>
>
>
> *From:* Karen_Broome at spe.sony.com [mailto:Karen_Broome at spe.sony.com]
> *Sent:* Monday, March 17, 2008 2:25 PM
> *To:* Peter Constable
> *Cc:* ietf-languages at iana.org
> *Subject:* RE: ID for language-invariant strings
>
>
>
>
> The "zxx" tag started with my query into how I should classify the "audio
> content" of a silent film in a system designed to serve non-silent films
> where a language code is required. Peter suggested "zxx = no linguistic
> content" and registered it.
>
> I felt that it might be better to use the industry terminology "silent"
> and employ a free tag in the "Q" space of ISO 639-2. While there was "no
> linguistic content" on that audio channel, there was certainly a plot that
> could be determined from watching the film even if the title cards were
> removed (a "title card" is an interstitial used to display the text in a
> silent film). To describe our wonderful heritage of silent films as having
> no linguistic content just seemed a bit cruel. I was willing to go with "not
> applicable" but could not recommend the use of "zxx = no linguistic content"
> for this purpose.
>
> When it was later suggested that "zxx" should be used to mark up code
> fragments appearing in a tutorial written in English, I was even more
> opposed to the "non-linguistic" semantic. I wasn't the only one who
> complained that code -- especially in the context of a technical tutorial --
> is primarily meant to be read by humans, not machines. An assistive device
> such as a Braille screenreader would  want to represent that text as
> language, not skip over it because it's non-linguistic in nature. Binary
> junk data is the only thing I can think of that is truly non-linguistic.
>
> Any chance we could broaden the semantic of the "zxx" tag? I still think
> we did the wrong thing here and the "non-applicable" tag is more appropriate
> for all the use cases mentioned.
>
>
> http://lists.w3.org/Archives/Public/www-international/2007AprJun/0187.html-- one previous post on the topic
>
> Side note: I find the IETF archives very hard to search or I could have
> produced a better example. Am I missing a search interface somewhere? (Reply
> offlist.)
>
> Regards,
>
> Karen Broome
>
> Peter Constable <petercon at microsoft.com> wrote on 03/14/2008 01:37:30 PM:
>
> > If "zxx" were "not applicable", I would not have any reservation
> > about semantic overloading for the application scenarios I have in
> > mind now. Funny, I really have no recollection of you suggesting
> > that at that time. (Sorry.)
> >
> >
> > Peter
> >
> > From: Karen_Broome at spe.sony.com [mailto:Karen_Broome at spe.sony.com]
> > Sent: Friday, March 14, 2008 12:51 PM
> > To: Peter Constable
> > Cc: ietf-languages at iana.org
> > Subject: RE: ID for language-invariant strings
> >
> >
> > I can keep restating the point I've made from the beginning. The
> > semantic for "zxx" should have been defined as "not applicable"
> > which was the use case presented at the time it was created. Since
> > it was not expressed in this way, now we need another tag, I think.
> >
> > Regards,
> >
> > Karen Broome
> > Metadata Systems Designer
> > Sony Pictures Entertainment
> > 310.244.4384
> >
> > ietf-languages-bounces at alvestrand.no wrote on 03/14/2008 08:49:31 AM:
> >
> > > > From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> > > > bounces at alvestrand.no] On Behalf Of Doug Ewell
> > > > Sent: Thursday, March 13, 2008 11:16 PM
> > > > To: ietf-languages at iana.org
> > > > Subject: Re: ID for language-invariant strings
> > >
> > > > ["zxx" is] a "less bad" fit than the other choices:
> > > >
> > > > zxx - content is not linguistic in nature
> > > > und - content is in an undetermined language
> > > > mis - content is in an otherwise uncoded language
> > > > i-default - content is in a default, fallback language intelligible
> to
> > > > anglophones
> > > >
> > > > I agree that inventing a new code element/subtag for this situation
> > > > would be undesirable.
> > >
> > > If it's less bad, I still think it kind of bad.
> > >
> > > For instance, suppose I need to apply language tags to each of the
> > > data elements in the main ISO 639-3 code table. For data in columns
> > > like the 639-3 ID, clearly "zxx" applies: the alpha-3 identifiers
> > > have no linguistic content. But what about the reference names?
> > > "zxx" would be a decidedly bad choice for that column, IMO, since
> > > every single data element is definitely linguistic in nature.
> > >
> > > I don't know why people are so adverse to new special-purpose code
> > > elements when there is a reasonable need. It's not like there are a
> > > lot of different special-case semantics that are needed in language-
> > > tagging application scenarios; I think the set is very small,
> > > perhaps even that this is the only important gap. I am *far* more
> > > concerned about overloading tags with distinct, orthogonal semantics
> > > for particular application scenarios ("und" means X in this
> > > application but Y in that application): *that* can lead to serious
> trouble.
> > >
> > > As I think about this, I'm inclined to propose a new special-purpose
> > > ID "zrf" in ISO 639:
> > >
> > > ID: zxn
> > > Reference name: language-neutral content
> > > Comment: This ID is provided primarily for application scenarios
> > >          in which a language identifier must be declared for
> > >          content that may be linguistic in nature but that is
> > >          used as a language-neutral identifier to reference or
> > >          index other information objects.
> > >
> > >          Uses of this code element do not make any declaration
> > >          regarding the actual language of a given data element
> > >          or of whether a given data element is, in fact,
> > >          linguistic in nature.
> > >
> > >          Note: for applications scenarios in which an identifier
> > >          string is unambiguously non-linguistic in nature, "zxx"
> > >          should be used rather than "zxn".
> > >
> > >          For example, in a database of coding elements for
> > >          cultural objects that includes for each such object a
> > >          code element such as an alpha-3 string (e.g., "abc")
> > >          and a reference name (e.g., "PIANO", "GUQIN"), the
> > >          language identifier applied to the code element
> > >          should be "zxx",but "zxn" may be applied to the
> > >          reference names.
> > >
> > >          Applications may also use "zxn" for content that is
> > >          Linguistic in nature but that is represented in a
> > >          Language-neutral form. For example, the concept 'ten'
> > >          Is linguistic in nature but can be expressed in the
> > >          Language-neutral form "10". Such use of "zxn" should
> > >          be considered only for application scenarios that
> > >          have a particular need; this usage is not recommended
> > >          in general. For instance, if a software application
> > >          needs to segment the strings in a document into items
> > >          that get passed to various language-specific processes
> > >          and it must apply a language identifier to language-
> > >          neutral content such as numbers represented as digits,
> > >          then "zxn" may be used within that application; but it
> > >          is not expected that content authors would apply "zxn"
> > >          to numbers in their documents in general.
> > >
> > >
> > >
> > > Peter
> > > _______________________________________________
> > > Ietf-languages mailing list
> > > Ietf-languages at alvestrand.no
> > > http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>

-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20080317/a13f7825/attachment-0001.html