The "not-language" identifier (was: RE: Mandarin Chinese,
Simplified Script)
John.Cowan
jcowan at reutershealth.com
Thu Jun 16 20:15:38 CEST 2005
Caoimhin O Donnaile scripsit:
> But according to section 2.12 http://www.w3.org/TR/REC-xml/ at least,
> (1) doesn't mean that there is or there isn't a language. It just
> unsets xml:lang
Quite right. The way to say "This is linguistic content, but we don't know
what it is" is "und".
> Anyone know whether elements which are tagged as
> <tag xml:lang="">
> are actually stored internally by XML processing software in an
> identical fashion to elements which are tagged simply as
> <tag>
> in the absence of any inherited xml:lang value?
> i.e. Is xml:lang="" actually processed as an "unset" command?
XML processors typically leave it up to the application how to process
xml:lang, including maintaining the inheritance from parent to child elements.
> And talking about sets, is the likes of:
> xml:lang=en,gd
> allowed?
No, it isn't, and AFAIK no one has ever asked for it.
> - For example, to tag a film as having mixed Gaelic and English
> dialogue.
That's not an appropriate use of xml:lang, though it is an appropriate use
of RFC 3066 tags. xml:lang should only be used to tag content that is
directly enclosed in XML markup, not as metadata about content stored
elsewhere.
> Or for a document containing mixed Gaelic and English, to say
> "Allow both Gaelic and English in spell-checking" without the chore of
> labelling every word for language.
Unless the content genuinely includes code-switching, markup generally operates
at the level of sentences or paragraphs.
--
What asininity could I have uttered John Cowan <jcowan at reutershealth.com>
that they applaud me thus? http://www.reutershealth.com
--Phocion, Greek orator http://www.ccil.org/~cowan
More information about the Ietf-languages
mailing list