The limit of language codes

Gerard Meijssen gerardm at wiktionaryz.org
Thu Feb 15 21:17:10 CET 2007


Hoi,
The modern languages have a big advantage. They are the languages that 
are spoken today. It is therefore relatively easy to treat them with 
bold strokes. When you start drilling down, you can have linguistic 
entities that are considered "dialects", you can have different 
orthographies. All well and good.

For me languages are a living thing and new words make their appearance 
continuously. They are completely apart from how we would like to mark 
language usage using meta tags. As words make their appearance, they do 
differentiate the language. When we want to mark them with meta data, it 
would still be "Dutch" ie nl but the meta data for a movie, a 
documentary would still need to include the moment when this particular 
recording was created.

In OmegaWiki, we need to tag linguistic entities. For the use of 
English, we have decided that when a word is spelled the same in 
contemporary en-UK and en-US, we only record it as en. This is 
satisfactory for us. Many words have only their use limited in time; who 
still thinks and talks of Internet as the "digital super highway" 
nowadays ? For a dictionary you identify the dates when they made their 
appearance and when they were seen last. When you want to divide a 
language in time slots, it is really arbitrary where you create the 
lines. Italian is a constructed language, this is also true for German. 
Orthographies are a relatively recent invention and consequently it is 
not really feasible to create spell checkers before a certain age. An 
age that differs per language...

The notion of having tags for historical languages makes sense when 
these language are dead. Tagging any other way is at best imprecise. So 
please do create a gazillion new tags for historical "languages", I am 
not sure that they are worth the paper they are written on. I am also 
afraid that they detract from what we have to achieve first; the correct 
tagging of content of contemporary material. With only 15% tagged of 
material on the Internet, there is plenty of convincing that we need to 
do. Convincing that using our tags /is /relevant.

Thanks,
     Gerard


More information about the Ietf-languages mailing list