Spanglish

Martin J. Dürst duerst at it.aoyama.ac.jp
Mon Dec 19 10:03:55 CET 2016


Hello Luc, others,

On 2016/12/18 22:07, Luc Pardon wrote:
>
>
> On 18-12-16 10:37, David Starner wrote:
>> Lesson #1 in databases is that you do not put multiple things in one
>> field.
>
>     Yes, yes and yes.
>
>     Actually, it is not a lesson but a rule (by Codd), and the
> corresponding lesson is "violate rule #1 and suffer the consequences".
>
>     BCP47 violates that rule big time, by packing all kind of things
> (script, orthography, ...) in a field that was originally intended (in
> HTML) to contain only the language.

That's how it looks if you think about language tags as a field value. 
It's better seen as a foreign key pointing to another table.

>    I've beaten that horse many times before, so I'll leave it at that
> and return to the topic at hand.
>
>
>    What it means to me as a technician is that the proposed solution of
> packing multiple languages in the lang tag (lang="en es"), is not (much)
> worse than "ru-Latn". Both violate basic database design rules.

I strongly beg to disagree. Very simplified, we have three items of 
data: 1) the document, 2) the language, 3) the script.

Now let's take another example, a lecture, where we have:
1) student, 2) title (name of lecture), 3) teacher

What you are suggesting is that we build a table with three columns. In 
the lecture example, this clearly is wrong, as the title and the teacher 
are strongly connected and appear many times in the same combination. 
Therefore, the correct way to normalize this is to create two tables, 
one that connects students to lectures, and another that lists title and 
teacher for each lecture. If I remember correctly, this is what 2NF is 
about.

Now in the case at hand, things are very similar. Language and script 
are intimately related. Therefore, having a single table with three 
columns would be a mistake. We separate this into a table that relates 
documents to language tags, and another (implicit) table that gives 
language and script (and potentially some other stuff) for each language 
tag.


> We have
> to live with the latter, so we could probably live with the former as well.
>
>    However, from a software vendor's perspective, it _is_ much worse,
> because it would break just about _all_ language processing
> applications.

I agree that introducing something like lang="es en" would be a mistake.

Regards,   Martin.


More information about the Ietf-languages mailing list