Subtag for indicating "marked" text?
sascha at brawer.ch
Wed Jul 6 14:56:34 CEST 2016
LANGUAGE SUBTAG REGISTRATION FORM
1. Name of requester: Sascha Brawer
2. E-mail address of requester: sascha at brawer.ch
3. Record Requested:
Description: Used to designate text with markers for tones, gemination,
vowel quality, etc. in languages where such marks are not part of the
4. Intended meaning of the subtag:
The presence of this subtag indicates that text has been marked with tones,
vowel length, vowel quality, etc. in languages where such marks are not part
of the regular spelling. Examples include: Arabic Tashkil and Hebrew Niqqud
diacritics to indicate short vowels; Hebrew cantillation marks; tone
Cherokee and Lingala; or gemination marks in Ethiopic languages.
Such markers are not written in regular text, but can be seen in children’s
books, dictionaries, language learning material, or specialized language
where preserving the pronunciation is important.
5. Reference to published description of the language (book or article):
6. Any other relevant information:
2016-07-06 13:07 GMT+02:00 Martin J. Dürst <duerst at it.aoyama.ac.jp>:
> Hello Sascha,
> Your idea looks good to me, because it indeed covers an actual need. My
> suggestion would be to prepare a registration template somewhat soonish.
> Regards, Martin.
> On 2016/07/06 19:22, Sascha Brawer wrote:
>> What would you think of registering an IETF language variant subtag to
>> denote text with marks for tones, gemination, vowel length, vowel quality,
>> etc. in languages where such marks are not part of the regular spelling?
>> For example, Arabic and Hebrew usually do not write short vowels. However,
>> optional marks can be used to indicate the vowels. Without a variant
>> subtag, we cannot give a BCP47 language code to corpora of text written in
>> “Arabic with vowel markers”.
>> Another example is Lingala, where optional marks are used to indicate
>> tones. In the Unicode UDHR project, we have Lingala text once with and
>> without tones. However, currently we cannot express this distinction with
>> BCP47 language tags:
>> (Apart from tones, the two texts should be identical. Currently they
>> aren’t, but that’s an unrelated problem).
>> Another example is Cherokee, where optional marks can be used to indicate
>> Another example is Amharic (and all other Ethiopic languages), where
>> optional marks are used to indicate syllables with geminated (=long)
>> consonants, and/or long vowels.
>> In all these examples, the markers are usually not written in regular
>> But in children’s books, teaching material for language learners,
>> texts, etc., the markers would be written to indicate the otherwise
>> ambiguous pronunciation. Also, there’s specialized applications (eg.
>> corpora for speech applications) that explicitly collect texts with such
>> markers attached. To identify marked text, it would be useful to have a
>> variant subtag.
>> An alternative to registering a general "marked" subtag might be different
>> subtags for "vowelmarked", "geminationmarked", "tonemarked", etc. Seems a
>> bit complicated, and those tags would have to be shortened to fit into the
>> length requirements.
>> What do you think?
>> — Sascha
>> Ietf-languages mailing list
>> Ietf-languages at alvestrand.no
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Ietf-languages