Spanglish

Pascal Vaillant pascal.vaillant at guyane.univ-ag.fr
Mon Dec 19 19:08:25 CET 2016



On Sunday 18 December 2016 03:47:42 Mats Blakstad wrote:
> We do have the subtag 'mul' (Multiple languages) - would be great if it was
> possible to use that code and also specify which languages it contains,
> something like 'mul-t-en-es'
> 

In a project where we had to build a corpus of language contact
phenomena (CLAPOTY) we have had to deal with that problem
(annotating code-switching or language ambiguity).

Our decision was that the BCP-47 language tag proper was not meant
to express that, but that we could embed this kind of information
in the XML.

So in case of mixed text units (whichever the level) we simply
used the all-purpose ISO-639-3 "mul", and specified in an XML
element the list of actual languages used.

Things look like this (in this example, French-based Creole
/ French => ambiguous language assignment):

<line lang="acf-MQ">
<segment lang="fra">Et de fait</segment> kon tout
<segment lang="mul">
  <langues><langue lang="fra"/><langue lang="acf-MQ"/></langues>
  <alt_transcription lang="fra">les études</alt_transcription>
  <alt_transcription lang="acf-MQ">lézétud</alt_transcription>
</segment>
ka
<segment lang="mul">
  <langues><langue lang="acf-MQ"/><langue lang="fra"/></langues>
  <alt_transcription lang="acf-MQ">montré</alt_transcription>
  <alt_transcription lang="fra">montrer</alt_transcription>
</segment>
</line>

Note the "<langues>" element, giving the "list of alternating
languages" of the multilingual segment.

Reference : 
https://www.academia.edu/6351935/%C3%80_la_crois%C3%A9e_des_langues._Annotation_et_fouille_de_corpus_plurilingues_2014_

Pascal Vaillant



More information about the Ietf-languages mailing list