Subtag for indicating "marked" text?

Martin J. Dürst duerst at it.aoyama.ac.jp
Wed Jul 6 13:07:08 CEST 2016


Hello Sascha,

Your idea looks good to me, because it indeed covers an actual need. My 
suggestion would be to prepare a registration template somewhat soonish.

Regards,   Martin.

On 2016/07/06 19:22, Sascha Brawer wrote:
> What would you think of registering an IETF language variant subtag to
> denote text with marks for tones, gemination, vowel length, vowel quality,
> etc. in languages where such marks are not part of the regular spelling?
>
> For example, Arabic and Hebrew usually do not write short vowels. However,
> optional marks can be used to indicate the vowels. Without a variant
> subtag, we cannot give a BCP47 language code to corpora of text written in
> “Arabic with vowel markers”.
> https://en.wikipedia.org/wiki/Arabic_diacritics#Tashkil_.28marks_used_as_phonetic_guides.29
> https://en.wikipedia.org/wiki/Hebrew_diacritics
>
> Another example is Lingala, where optional marks are used to indicate
> tones. In the Unicode UDHR project, we have Lingala text once with and once
> without tones. However, currently we cannot express this distinction with
> BCP47 language tags:
> http://www.unicode.org/udhr/d/udhr_lin.html
> http://www.unicode.org/udhr/d/udhr_lin_tones.html
> (Apart from tones, the two texts should be identical. Currently they
> aren’t, but that’s an unrelated problem).
>
> Another example is Cherokee, where optional marks can be used to indicate
> tones.
>
> Another example is Amharic (and all other Ethiopic languages), where
> optional marks are used to indicate syllables with geminated (=long)
> consonants, and/or long vowels.
>
> In all these examples, the markers are usually not written in regular text.
> But in children’s books, teaching material for language learners, religious
> texts, etc., the markers would be written to indicate the otherwise
> ambiguous pronunciation. Also, there’s specialized applications (eg.
> corpora for speech applications) that explicitly collect texts with such
> markers attached. To identify marked text, it would be useful to have a
> variant subtag.
>
> An alternative to registering a general "marked" subtag might be different
> subtags for "vowelmarked", "geminationmarked", "tonemarked", etc. Seems a
> bit complicated, and those tags would have to be shortened to fit into the
> length requirements.
>
> What do you think?
>
> — Sascha
>
>
>
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>


More information about the Ietf-languages mailing list