gender voice variants

Yury Tarasievich yury.tarasievich at gmail.com
Fri Dec 21 07:41:47 CET 2012


On 12/21/2012 02:02 AM, John Cowan wrote:
> Michael Everson scripsit:
>
>>> It has already been shown that [the sex of speakers and listeners]
>>> does affect [the choice of words, grammatical forms, etc.] in all
>>> languages, though the degree varies.  I believe it's therefore
>>> appropriate to encode it within, rather than just alongside, the
>>> language variety system.
>>
>> Why, exactly?
>
> Because I believe it to be similar in character to the things we already
> encode as language variants that do not affect intelligibility that
> much, but are important to distinguish in some cases.  Looking over
> the subtag registry, I find the distinctions represented there are:
> like the speaker's or writer's point of origin, the period of use,
> the writer's spelling conventions, and the use of unusual terminology.

To clarify things (to myself, too): I understand 
now that Peter and Karen actually want this 
subtag to serve as a hint to a "grammar engine".
So that sentence meaning, e.g.,

"Welcome, A, I am machine B"

might be tagged like, e.g.,

"<language=lang0(e.g.,ru_1956acad)>Welcome, 
<recipient=genus2(e.g.,masc)>A</>, 
<originator=genus1(e.g.,fem)>I am</> machine 
<originator=genus1(e.g.,fem)>B</>"

and sort of post-processed before being 
presented to recipient.

In Russian, that'd mean using fem. genus 
pronouns in sing. 3rd person, and changing 
subordinated adjectives and verbs appropriately 
in sing. past tense. Of course, that'd require 
pre-processing -- tagging the parts of sentence 
which would have to be changed in such manner 
(like shown above).

Obviously, such selection of grammatical forms 
makes sense only inside a certain grammar 
codification. There'd be some minor differences 
in said changes if processing by the rules of 
pre-1918 Russian grammar. Same with formal style 
of addressing (in modern Russian it essentially 
means switching to plural).

 From the discussion I gathered that "welcome" 
in mentioned sentence also might have to be 
changed (pre-tagged) in order to accomodate for 
a <originator><recipient> combination (in 
Italian). It, too, has to be pre-tagged.

All this presupposes also that the translation 
will mostly keep the original sentence 
structure. And that is lost quite frequently 
even when translating UI to, say, Russian (or 
you get weird Russian). That might happen in 
_other_ phrases when translating to _other_ 
languages.

All because the grammar is an expression for 
meaning (semantics), and you can't tag 
semantics, only its original expression in, 
well, English.

So, in the end of the day you either end up with 
4 (2 etc.) pre-modified copies for each 
changeable phrase for each translation language, 
or you get to write fairly complicated 
language/grammar engines (talkers) at least one 
for each family of languages.

What I want to say, things will not work out as 
smoothly as Peter and Karen expect (?) them to. 
But the subtag itself, intended as a grammar 
hint won't harm.

-Yury












More information about the Ietf-languages mailing list