Phonetic orthographies

Mon Nov 27 17:49:02 CET 2006

Don Osborn schreef:
> John Cowan wrote:
>   
>> Gerard Meijssen scripsit:
>>
>>     
>>> Correct tagging implies that it is precise.
>>>       
>> This may be true in your particular application of language tagging,
>> but not in general.  Often precision is unobtainable or even
>> undesirable.  RFC 4646 says "tag wisely", *not* "tag exhaustively".
>>     
>
> Thanks for this remark re tagging. The way I read it, the appropriate or
> "wise" precision of the tagging depends on the context and need. In some
> cases, imprecision might accommodate reality more appropriately, and in
> others more precision would be indicated.
>
> In returning to the Wikipedia & WiktionaryZ discussion earlier, I would
> think that Wikipedia would not always want to be too precise (ISO-639-3 has
> 20-some codes for Arabic, but you'd only want to use ISO-639-1's code ar for
> ar.wikipedia.org [with only a rare possible exception]). 
>
> On the other hand, WiktionaryZ might want to rely more systematically on
> ISO-639-3 (and eventually perhaps -6) which can specify the origins and use
> of words that may be particular in form, pronunciation or meaning according
> to dialect.
Hoi,
I agree with the sentiment for Wikipedia. It is however problematic that 
the arguments for the creation of new editions of Wikipedia are not 
always based on linguistic but often on political arguments. The latest 
of these is a request for a Montenegrin language. The Wikipedia article 
on Montenegrin is dismissive and explains that it is part of a language 
continuum where some want to make these changes stronger by including 
extra characters while the language used to be part of what is 
considered Serbian or even Serbo-Croatian.

WiktionaryZ does use ISO-639-3 to indicate languages. There are now 
portals for the vast majority of these set up that already contain a 
bare minimum of information. For languages like Mandarin, Hausa and 
Serbian we allow for alternate scripts. Having a connection to what is 
going to be the ISO-639-6 would indeed be beneficial, WiktionaryZ allows 
for the modelling of hierarchical relations that is part of the 
ISO-639-6 set-up. It being a wiki would allow people to come up with 
their arguments for why "their" language is not how it is perceived by 
those scientist types .. :) I am happy to say that we already ask people 
to create Swadesh lists for their language/dialect. We already /have 
/people interested in languages like Ripuarian or is it Koelsch in 
Wikipedia (this being a big continuum of hard to define dialects). We 
have already done some thinking on how to handle these types of non 
standardised orthographies.

The problem with "tag wisely" is not necessarily clear to all. When 
content is available on the Internet, one groups need for very specific 
tagging would be deemed unwise and unwanted for others. ISO-639-3 does 
provide much more and clearly needed granularity and is not easily 
translated into the IANA language subtags. Given that the ISO-639-3 will 
be soon official, I hope this will coincide with an authorised list that 
maps these ISO codes to what the IETF wants us to use.

Thanks,
    Gerard

sources:
http://en.wikipedia.org/wiki/Montenegrin_language
http://wiktionaryz.org/Category:Language_portals