ID for language-invariant strings
Karen_Broome at spe.sony.com
Karen_Broome at spe.sony.com
Mon Mar 17 22:25:05 CET 2008
The "zxx" tag started with my query into how I should classify the "audio
content" of a silent film in a system designed to serve non-silent films
where a language code is required. Peter suggested "zxx = no linguistic
content" and registered it.
I felt that it might be better to use the industry terminology "silent"
and employ a free tag in the "Q" space of ISO 639-2. While there was "no
linguistic content" on that audio channel, there was certainly a plot that
could be determined from watching the film even if the title cards were
removed (a "title card" is an interstitial used to display the text in a
silent film). To describe our wonderful heritage of silent films as having
no linguistic content just seemed a bit cruel. I was willing to go with
"not applicable" but could not recommend the use of "zxx = no linguistic
content" for this purpose.
When it was later suggested that "zxx" should be used to mark up code
fragments appearing in a tutorial written in English, I was even more
opposed to the "non-linguistic" semantic. I wasn't the only one who
complained that code -- especially in the context of a technical tutorial
-- is primarily meant to be read by humans, not machines. An assistive
device such as a Braille screenreader would want to represent that text
as language, not skip over it because it's non-linguistic in nature.
Binary junk data is the only thing I can think of that is truly
Any chance we could broaden the semantic of the "zxx" tag? I still think
we did the wrong thing here and the "non-applicable" tag is more
appropriate for all the use cases mentioned.
-- one previous post on the topic
Side note: I find the IETF archives very hard to search or I could have
produced a better example. Am I missing a search interface somewhere?
Peter Constable <petercon at microsoft.com> wrote on 03/14/2008 01:37:30 PM:
> If “zxx” were “not applicable”, I would not have any reservation
> about semantic overloading for the application scenarios I have in
> mind now. Funny, I really have no recollection of you suggesting
> that at that time. (Sorry.)
> From: Karen_Broome at spe.sony.com [mailto:Karen_Broome at spe.sony.com]
> Sent: Friday, March 14, 2008 12:51 PM
> To: Peter Constable
> Cc: ietf-languages at iana.org
> Subject: RE: ID for language-invariant strings
> I can keep restating the point I've made from the beginning. The
> semantic for "zxx" should have been defined as "not applicable"
> which was the use case presented at the time it was created. Since
> it was not expressed in this way, now we need another tag, I think.
> Karen Broome
> Metadata Systems Designer
> Sony Pictures Entertainment
> ietf-languages-bounces at alvestrand.no wrote on 03/14/2008 08:49:31 AM:
> > > From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> > > bounces at alvestrand.no] On Behalf Of Doug Ewell
> > > Sent: Thursday, March 13, 2008 11:16 PM
> > > To: ietf-languages at iana.org
> > > Subject: Re: ID for language-invariant strings
> > > ["zxx" is] a "less bad" fit than the other choices:
> > >
> > > zxx - content is not linguistic in nature
> > > und - content is in an undetermined language
> > > mis - content is in an otherwise uncoded language
> > > i-default - content is in a default, fallback language intelligible
> > > anglophones
> > >
> > > I agree that inventing a new code element/subtag for this situation
> > > would be undesirable.
> > If it's less bad, I still think it kind of bad.
> > For instance, suppose I need to apply language tags to each of the
> > data elements in the main ISO 639-3 code table. For data in columns
> > like the 639-3 ID, clearly "zxx" applies: the alpha-3 identifiers
> > have no linguistic content. But what about the reference names?
> > "zxx" would be a decidedly bad choice for that column, IMO, since
> > every single data element is definitely linguistic in nature.
> > I don't know why people are so adverse to new special-purpose code
> > elements when there is a reasonable need. It's not like there are a
> > lot of different special-case semantics that are needed in language-
> > tagging application scenarios; I think the set is very small,
> > perhaps even that this is the only important gap. I am *far* more
> > concerned about overloading tags with distinct, orthogonal semantics
> > for particular application scenarios ("und" means X in this
> > application but Y in that application): *that* can lead to serious
> > As I think about this, I'm inclined to propose a new special-purpose
> > ID "zrf" in ISO 639:
> > ID: zxn
> > Reference name: language-neutral content
> > Comment: This ID is provided primarily for application scenarios
> > in which a language identifier must be declared for
> > content that may be linguistic in nature but that is
> > used as a language-neutral identifier to reference or
> > index other information objects.
> > Uses of this code element do not make any declaration
> > regarding the actual language of a given data element
> > or of whether a given data element is, in fact,
> > linguistic in nature.
> > Note: for applications scenarios in which an identifier
> > string is unambiguously non-linguistic in nature, "zxx"
> > should be used rather than "zxn".
> > For example, in a database of coding elements for
> > cultural objects that includes for each such object a
> > code element such as an alpha-3 string (e.g., "abc")
> > and a reference name (e.g., "PIANO", "GUQIN"), the
> > language identifier applied to the code element
> > should be "zxx",but "zxn" may be applied to the
> > reference names.
> > Applications may also use "zxn" for content that is
> > Linguistic in nature but that is represented in a
> > Language-neutral form. For example, the concept 'ten'
> > Is linguistic in nature but can be expressed in the
> > Language-neutral form "10". Such use of "zxn" should
> > be considered only for application scenarios that
> > have a particular need; this usage is not recommended
> > in general. For instance, if a software application
> > needs to segment the strings in a document into items
> > that get passed to various language-specific processes
> > and it must apply a language identifier to language-
> > neutral content such as numbers represented as digits,
> > then "zxn" may be used within that application; but it
> > is not expected that content authors would apply "zxn"
> > to numbers in their documents in general.
> > Peter
> > _______________________________________________
> > Ietf-languages mailing list
> > Ietf-languages at alvestrand.no
> > http://www.alvestrand.no/mailman/listinfo/ietf-languages
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ietf-languages