Spanglish

Sun Jan 8 22:35:53 CET 2017

On 08-01-17 17:59, Doug Ewell wrote:
> Luc Pardon wrote:
> 
>> On 05-01-17 21:42, Doug Ewell wrote:
>>
>>> My problem with this is that, for purposes of identifying a given
>>> Spanglish text, it
>>>
>>> (a) may not be feasible to determine, and
>>>
>>> (b) probably is not relevant
>>>
>>> which is the matrix language and which is embedded. Thus we require
>>> the tag consumer to care about a distinction that he would not
>>> otherwise care about.
>>
>> I didn't read it as a requirement, just a recommendation that he
>> selects the language prefix with care, not just randomly.
> 
> That's the tag creator, not the consumer. 

  OK, got it. Since we were talking about the registry, I assumed the
creator.

> The user or entity searching
> for Spanglish content -- the consumer -- would essentially need to
> search for both "en-spanglis" and "es-spanglis". This is pointless and
> there is no type of fallback mechanism that would prioritize the variant
> subtag ahead of the primary language subtag.

  I agree that it would be pointless, but this is a use case I am not
familiar with, and I have no idea how a search for content in a specific
language would be implemented. Maybe with "language:*-spanglis", similar
to "author:*Lewis*"?

> 
>> As I understand it, that means we simply expect him to count the
>> number of Spanish words and that of English words in his text, and
>> then choose the prefix that matches the highest count. Just
>> arithmetics, no linguistics.
>>
>> He would definitely care about it if he is publishing in a context
>> that requires accessibility. If his text consists of 80% English
>> words, and he tags with "es-spanglis", he would have to mark up 80%
>> of the words as English. On the other hand, if he tags with
>> "en-spanglis", he can meet the requirement by tagging only 20%.
> 
> And what if it's 51% to 49%? 

   He would still save 2% on his tagging efforts. Not sure if that
outweighs the counting effort, but since some people seem to consider
fine-grained tagging as part of the Ninth Circle of Hell ...

> Or if, as John pointed out, word count
> doesn't really tell the story?

   John was talking from a linguistic point of view, whereas I simply
want screen reader software to produce the same sounds as a human
Spanglish speaker reading the text aloud.

  So I wouldn't mind if some specific text, with English as the matrix
language and a Spanish word count of 80%, was tagged with "en-spanglis"
in the head, as long as all the Spanish words are tagged with "es".

  We may have to file a request with Dante to register a Tenth Circle
for this scenario, but if that is what it takes to make everybody happy ...

> 
> I can't imagine any tag creator going through this exercise, and it
> doesn't help the tag consumer at all. We have gone all this time without
> a way to tag Spanglish; surely we do not now need to distinguish these
> two edge cases.

   Just to bring some owls to Athens: what is currently on the table is
a variant subtag and so we can't really do without the prefixes for both
"en" and "es". That means the tag creator will have to choose one or the
other anyway, anyhow.

> 
> Can we not even have the discussion about registering a language subtag
> "spanglis" instead of a mechanism that forces users to pick between
> English-dominant and Spanish-dominant?

   Are you talking about a _primary_ language subtag? I didn't even dare
to bring that up, but from what I learned about Spanglish it wouldn't
seem a bad idea at all.

   Problem is we'd have to send Michael to ISO 639 first, and only when
he comes back empty-handed can the discussion be started.

   Luc