Spanglish

Luc Pardon lucp at skopos.be
Sun Dec 18 14:07:50 CET 2016



On 18-12-16 10:37, David Starner wrote:
> Lesson #1 in databases is that you do not put multiple things in one
> field. 

    Yes, yes and yes.

    Actually, it is not a lesson but a rule (by Codd), and the
corresponding lesson is "violate rule #1 and suffer the consequences".

    BCP47 violates that rule big time, by packing all kind of things
(script, orthography, ...) in a field that was originally intended (in
HTML) to contain only the language.

   I've beaten that horse many times before, so I'll leave it at that
and return to the topic at hand.


   What it means to me as a technician is that the proposed solution of
packing multiple languages in the lang tag (lang="en es"), is not (much)
worse than "ru-Latn". Both violate basic database design rules. We have
to live with the latter, so we could probably live with the former as well.

   However, from a software vendor's perspective, it _is_ much worse,
because it would break just about _all_ language processing
applications. Screen readers would not know how to pronounce the
individual words, they would have to be reprogrammed to deal with
multiple languages. Likewise, spell checkers would have to be told to
look in multiple dictionaries, and so on. Sure, that software can all be
fixed, but that costs money, and vendors don't like to invest money
unless it brings in some return.

   That being said,


> There's going to be no way to store the fact that you have a copy
> of Crime and Punishment in Russian and retrieve that quickly if you
> combine that information with the language of the notes or citations.
> And presumably the language tags for Crime and Punishment in Russian
> with French notes should be different from War and Peace in Russian and
> French. 

   There sure is a way, simply by tagging at multiple levels. Read on.

> I'm pretty sure the language tag is not the level to try and
> deal with this.

   Not the language tag at the top of the document, no.

   But you can tag the document as a whole with "lang=ru" and then
proceed to tag the notes or citations separately as "lang=fr". Problem
solved. Your search will bring up the annotated copy along with all the
other Russian-language copies/editions of the book.

   In fact, this solution is _required_ by the rules for accessibility
design. Even at the basic (minimal) level, it is mandatory to tag
language changes _within_ the document separately.

   For good reason, because without those additional tags, a screen
reader for the blind would try to pronounce the French notes as if they
were in Russian, making it pretty incomprehensible to the listeners
(even to those who do master both languages).


   Applied to the topic under discussion, "Alice in Spanglishland" would
have to be tagged with "en" at the top of the document, and the Spanish
words inside the text would have to be marked up separately (i.e. "<span
lang="es">caramba</span> in HTML syntax). Or the other way around, if
the majority of the words are Spanish.

   It means additional work, yes, but most word processors I know of
already have an option to set a separate language for selected text or
for an entire paragraph, as well as for the entire document.


   Having a separate tag in the repository for "Spanglish" would
certainly bring (even) less blood, sweat and tears for the publisher,
but if we take that road, we should apply the usual criteria before
approving it. For example, are there published references for this
"language", with a good definition, so that we can hold up an unknown
text against it and decide if it is "Spanglish" indeed?

   Or would it be enough if the Queen keeps saying "Caramba, off with
their heads" to make it Spanglish, even if that is the only Spanish word
in the entire book? Or would that then be "Englanish"?

   And yes, we would be opening the floodgates for an infinite number of
separate tags for texts that contain any number of words and notes and
citations in other languages than the main text, if for no other reason
that the author is "blasé".

   Bottom line: since there is an established way to deal with language
changes inside a document, I'd suggest that we use that.


   Luc
> 
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
> 



More information about the Ietf-languages mailing list