The "not-language" identifier (was: RE: Mandarin Chinese, Simplified Script)

Caoimhin O Donnaile caoimhin at smo.uhi.ac.uk
Thu Jun 16 19:49:08 CEST 2005


> <tag xml:lang="en">
> 	<tag xml:lang="fr">
> 		<tag xml:lang=""> (1)
> 		<tag xml:lang="xnl"> (2)
> 
> (1) appears to say that there is a language, but we're not telling.
> (2) would suggest that there is no language to be had

(2) would mean what you say all right.

But according to section 2.12 http://www.w3.org/TR/REC-xml/ at least, 
(1) doesn't mean that there is or there isn't a language.  It just 
unsets xml:lang

   "Within [the element] it is considered that there is no language
    information available, just as if xml:lang had not been
    specified on [the element] or any of its ancestors."

Anyone know whether elements which are tagged as
     <tag xml:lang="">
are actually stored internally by XML processing software in an 
identical fashion to elements which are tagged simply as
     <tag>
in the absence of any inherited xml:lang value?
i.e. Is xml:lang="" actually processed as an "unset" command?

And talking about sets, is the likes of:
     xml:lang=en,gd
allowed? - For example, to tag a film as having mixed Gaelic and English 
dialogue.  Or for a document containing mixed Gaelic and English, to say 
"Allow both Gaelic and English in spell-checking" without the chore of 
labelling every word for language.  (It looks from 
http://www.x3.org/TR/REC-xml/ as if it isn't allowed.)

Forgive my ignorance.  I am new to XML.

Caoimhín Ó Donnaíle


More information about the Ietf-languages mailing list