everson at evertype.com
Fri Nov 27 12:39:00 CET 2015
On 27 Nov 2015, at 06:06, Kent Karlsson <kent.karlsson14 at telia.com> wrote:
>> The Wikipedia is a big and important application, is it not?
> Yes, so. en-levelB2 (if that corresponds to what they are targeting, that would work just fine for Wikipedia and many others.
It is not within our scope to assign CEFR levels to identify different kinds of language variants.
Moreover, it is really unlikely that they will wish to replace http://simple.wikipedia.org with http://en-levelB2.wikipedia.org
> That Wikipedia has house rules precising what the level is fine, but nothing "we” should encode.
We encode subtags which extend the code for representation of names of languages. English, en. Scouse English en-scouse.
> Likewise for VoA "Learning English" levels (which I do think can
> be found correspond to CEFR levels). They have house rules (I'd assume,
> though they don't appear to have published them), but "we" should not
> attempt to embody the house rules in LSR.
wpsimple points to Wikipedia which has its own rules.
>> No, they aren¹t. Our subtags describe linguistic entities, not hierarchies of language-learning and speaker competence.
> Language-learning and speaker competence levels of a particular language are linguistic entities as well. "wpsimple" is just one instance.
Speaker competence is out of scope. Whether I can understand a sentence you write in Swedish is of no consequence. The subtag identifies it as Swedish.
>> en-scouse points directly at Scouse. en-cornu points directly at Cornu-English/Anglo-Cornish/Cornish English. en-basiceng would point directly at Basic English. CEFR hierarchies have nothing to do with this. Our subtags point at things. I don¹t think it is within our scope to pick a set of CEFR definitions and attempt to apply them (on the basis of no research) to one or more varieties of controlled vocabulary and syntax. The CEFR is ALL about learner competence with regard to standard language, and Basic English and Wikipedia Simple English are examples of controlled language (engineered language, not constructed language), not examples of standard language.
> Disregarding Ogden's Basic English (which must NOT get the subtag 'basiceng'), which is not "simplified English", but rather a (strangely) "constrained English”.
It isn’t called “simplified English”. The Wikipedia’s is. Basic English is called Basic English; it isn’t called anything else. Your “must NOT” is your opinion. I don’t share it. “basiceng” points to “Basic English”. It doesn’t point to “Wikipedia Simple English” or VOA or anybody else’s thing. The right thing to do is to use a name which is iconic and
Your sentence was a fragment, by the way.
> No, I don't say that *WE* should attempt to apply CEFR levels to simplified form so-and-so from Wikipedia, VoA, or anyone else. That should be up to Wikipedia, VoA, and anyone else (respectively) [and these I do think can be reasonably mapped; by THEM, not us].
What, you want me to go to Amir and say “Map this to the CEFR hierarchy, so we can use a name which none of your users will understand”? No, Kent. That’s not a good idea. Nor is it workable, because Amir isn’t going to be able to do that mapping either. Nor have the CEFR specified levels of difficulty. A text is a text. The CEFR specifies levels of user competence. That is a different thing.
> But I do not find it appropriate to encode house rules for company/organisation so-and-so in LSR, regardless of how big they are. BUT we should cater for this use-case (simplified, or learner's, language) in LSR, but in a general manner, not house rules. The latter are of course needed, but a matter for each "house" (company/organisation), not the LSR.
We aren’t encoding Wikipedia’s house rules. They have already done that on references which are linked to in the application. We are providing a label which identifies their usage.
Michael Everson * http://www.evertype.com/
More information about the Ietf-languages