pascal.vaillant at guyane.univ-ag.fr
Mon Dec 19 19:08:25 CET 2016
On Sunday 18 December 2016 03:47:42 Mats Blakstad wrote:
> We do have the subtag 'mul' (Multiple languages) - would be great if it was
> possible to use that code and also specify which languages it contains,
> something like 'mul-t-en-es'
In a project where we had to build a corpus of language contact
phenomena (CLAPOTY) we have had to deal with that problem
(annotating code-switching or language ambiguity).
Our decision was that the BCP-47 language tag proper was not meant
to express that, but that we could embed this kind of information
in the XML.
So in case of mixed text units (whichever the level) we simply
used the all-purpose ISO-639-3 "mul", and specified in an XML
element the list of actual languages used.
Things look like this (in this example, French-based Creole
/ French => ambiguous language assignment):
<segment lang="fra">Et de fait</segment> kon tout
<langues><langue lang="fra"/><langue lang="acf-MQ"/></langues>
<alt_transcription lang="fra">les études</alt_transcription>
<langues><langue lang="acf-MQ"/><langue lang="fra"/></langues>
Note the "<langues>" element, giving the "list of alternating
languages" of the multilingual segment.
More information about the Ietf-languages