ID for language-invariant strings

Mon Mar 17 22:25:05 CET 2008

The "zxx" tag started with my query into how I should classify the "audio 
content" of a silent film in a system designed to serve non-silent films 
where a language code is required. Peter suggested "zxx = no linguistic 
content" and registered it. 

I felt that it might be better to use the industry terminology "silent" 
and employ a free tag in the "Q" space of ISO 639-2. While there was "no 
linguistic content" on that audio channel, there was certainly a plot that 
could be determined from watching the film even if the title cards were 
removed (a "title card" is an interstitial used to display the text in a 
silent film). To describe our wonderful heritage of silent films as having 
no linguistic content just seemed a bit cruel. I was willing to go with 
"not applicable" but could not recommend the use of "zxx = no linguistic 
content" for this purpose.

When it was later suggested that "zxx" should be used to mark up code 
fragments appearing in a tutorial written in English, I was even more 
opposed to the "non-linguistic" semantic. I wasn't the only one who 
complained that code -- especially in the context of a technical tutorial 
-- is primarily meant to be read by humans, not machines. An assistive 
device such as a Braille screenreader would  want to represent that text 
as language, not skip over it because it's non-linguistic in nature. 
Binary junk data is the only thing I can think of that is truly 
non-linguistic.

Any chance we could broaden the semantic of the "zxx" tag? I still think 
we did the wrong thing here and the "non-applicable" tag is more 
appropriate for all the use cases mentioned.

http://lists.w3.org/Archives/Public/www-international/2007AprJun/0187.html 
-- one previous post on the topic

Side note: I find the IETF archives very hard to search or I could have 
produced a better example. Am I missing a search interface somewhere? 
(Reply offlist.)

Regards,

Karen Broome

Peter Constable <petercon at microsoft.com> wrote on 03/14/2008 01:37:30 PM:

> If “zxx” were “not applicable”, I would not have any reservation 
> about semantic overloading for the application scenarios I have in 
> mind now. Funny, I really have no recollection of you suggesting 
> that at that time. (Sorry.)
> 
> 
> Peter
> 
> From: Karen_Broome at spe.sony.com [mailto:Karen_Broome at spe.sony.com] 
> Sent: Friday, March 14, 2008 12:51 PM
> To: Peter Constable
> Cc: ietf-languages at iana.org
> Subject: RE: ID for language-invariant strings
> 
> 
> I can keep restating the point I've made from the beginning. The 
> semantic for "zxx" should have been defined as "not applicable" 
> which was the use case presented at the time it was created. Since 
> it was not expressed in this way, now we need another tag, I think. 
> 
> Regards, 
> 
> Karen Broome
> Metadata Systems Designer
> Sony Pictures Entertainment
> 310.244.4384 
> 
> ietf-languages-bounces at alvestrand.no wrote on 03/14/2008 08:49:31 AM:
> 
> > > From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> > > bounces at alvestrand.no] On Behalf Of Doug Ewell
> > > Sent: Thursday, March 13, 2008 11:16 PM
> > > To: ietf-languages at iana.org
> > > Subject: Re: ID for language-invariant strings
> > 
> > > ["zxx" is] a "less bad" fit than the other choices:
> > >
> > > zxx - content is not linguistic in nature
> > > und - content is in an undetermined language
> > > mis - content is in an otherwise uncoded language
> > > i-default - content is in a default, fallback language intelligible 
to
> > > anglophones
> > >
> > > I agree that inventing a new code element/subtag for this situation
> > > would be undesirable.
> > 
> > If it's less bad, I still think it kind of bad.
> > 
> > For instance, suppose I need to apply language tags to each of the 
> > data elements in the main ISO 639-3 code table. For data in columns 
> > like the 639-3 ID, clearly "zxx" applies: the alpha-3 identifiers 
> > have no linguistic content. But what about the reference names? 
> > "zxx" would be a decidedly bad choice for that column, IMO, since 
> > every single data element is definitely linguistic in nature.
> > 
> > I don't know why people are so adverse to new special-purpose code 
> > elements when there is a reasonable need. It's not like there are a 
> > lot of different special-case semantics that are needed in language-
> > tagging application scenarios; I think the set is very small, 
> > perhaps even that this is the only important gap. I am *far* more 
> > concerned about overloading tags with distinct, orthogonal semantics
> > for particular application scenarios ("und" means X in this 
> > application but Y in that application): *that* can lead to serious 
trouble.
> > 
> > As I think about this, I'm inclined to propose a new special-purpose
> > ID "zrf" in ISO 639:
> > 
> > ID: zxn
> > Reference name: language-neutral content
> > Comment: This ID is provided primarily for application scenarios
> >          in which a language identifier must be declared for
> >          content that may be linguistic in nature but that is
> >          used as a language-neutral identifier to reference or
> >          index other information objects.
> > 
> >          Uses of this code element do not make any declaration
> >          regarding the actual language of a given data element
> >          or of whether a given data element is, in fact,
> >          linguistic in nature.
> > 
> >          Note: for applications scenarios in which an identifier
> >          string is unambiguously non-linguistic in nature, "zxx"
> >          should be used rather than "zxn".
> > 
> >          For example, in a database of coding elements for
> >          cultural objects that includes for each such object a
> >          code element such as an alpha-3 string (e.g., "abc")
> >          and a reference name (e.g., "PIANO", "GUQIN"), the
> >          language identifier applied to the code element
> >          should be "zxx",but "zxn" may be applied to the
> >          reference names.
> > 
> >          Applications may also use "zxn" for content that is
> >          Linguistic in nature but that is represented in a
> >          Language-neutral form. For example, the concept 'ten'
> >          Is linguistic in nature but can be expressed in the
> >          Language-neutral form "10". Such use of "zxn" should
> >          be considered only for application scenarios that
> >          have a particular need; this usage is not recommended
> >          in general. For instance, if a software application
> >          needs to segment the strings in a document into items
> >          that get passed to various language-specific processes
> >          and it must apply a language identifier to language-
> >          neutral content such as numbers represented as digits,
> >          then "zxn" may be used within that application; but it
> >          is not expected that content authors would apply "zxn"
> >          to numbers in their documents in general.
> > 
> > 
> > 
> > Peter
> > _______________________________________________
> > Ietf-languages mailing list
> > Ietf-languages at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/ietf-languages
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20080317/2e383208/attachment-0001.html