The "not-language" identifier (was: RE: Mandarin Chinese, Simplified Script)

John.Cowan jcowan at
Thu Jun 16 20:15:38 CEST 2005

Caoimhin O Donnaile scripsit:

> But according to section 2.12 at least, 
> (1) doesn't mean that there is or there isn't a language.  It just 
> unsets xml:lang

Quite right.  The way to say "This is linguistic content, but we don't know
what it is" is "und".

> Anyone know whether elements which are tagged as
>      <tag xml:lang="">
> are actually stored internally by XML processing software in an 
> identical fashion to elements which are tagged simply as
>      <tag>
> in the absence of any inherited xml:lang value?
> i.e. Is xml:lang="" actually processed as an "unset" command?

XML processors typically leave it up to the application how to process
xml:lang, including maintaining the inheritance from parent to child elements.

> And talking about sets, is the likes of:
>      xml:lang=en,gd
> allowed?

No, it isn't, and AFAIK no one has ever asked for it.

> - For example, to tag a film as having mixed Gaelic and English 
> dialogue.

That's not an appropriate use of xml:lang, though it is an appropriate use
of RFC 3066 tags.  xml:lang should only be used to tag content that is
directly enclosed in XML markup, not as metadata about content stored

> Or for a document containing mixed Gaelic and English, to say 
> "Allow both Gaelic and English in spell-checking" without the chore of 
> labelling every word for language.

Unless the content genuinely includes code-switching, markup generally operates
at the level of sentences or paragraphs.

What asininity could I have uttered     John Cowan <jcowan at>
that they applaud me thus?    
        --Phocion, Greek orator

More information about the Ietf-languages mailing list