The "not-language" identifier (was: RE: Mandarin Chinese, Simplified Script)

John.Cowan jcowan at reutershealth.com
Thu Jun 16 20:15:38 CEST 2005


Caoimhin O Donnaile scripsit:

> But according to section 2.12 http://www.w3.org/TR/REC-xml/ at least, 
> (1) doesn't mean that there is or there isn't a language.  It just 
> unsets xml:lang

Quite right.  The way to say "This is linguistic content, but we don't know
what it is" is "und".

> Anyone know whether elements which are tagged as
>      <tag xml:lang="">
> are actually stored internally by XML processing software in an 
> identical fashion to elements which are tagged simply as
>      <tag>
> in the absence of any inherited xml:lang value?
> i.e. Is xml:lang="" actually processed as an "unset" command?

XML processors typically leave it up to the application how to process
xml:lang, including maintaining the inheritance from parent to child elements.

> And talking about sets, is the likes of:
>      xml:lang=en,gd
> allowed?

No, it isn't, and AFAIK no one has ever asked for it.

> - For example, to tag a film as having mixed Gaelic and English 
> dialogue.

That's not an appropriate use of xml:lang, though it is an appropriate use
of RFC 3066 tags.  xml:lang should only be used to tag content that is
directly enclosed in XML markup, not as metadata about content stored
elsewhere.

> Or for a document containing mixed Gaelic and English, to say 
> "Allow both Gaelic and English in spell-checking" without the chore of 
> labelling every word for language.

Unless the content genuinely includes code-switching, markup generally operates
at the level of sentences or paragraphs.

-- 
What asininity could I have uttered     John Cowan <jcowan at reutershealth.com>
that they applaud me thus?              http://www.reutershealth.com
        --Phocion, Greek orator         http://www.ccil.org/~cowan


More information about the Ietf-languages mailing list