Label separators in Dzongkha (Re: Feedback of PAN L10n project)

Harald Tveit Alvestrand harald at alvestrand.no
Wed Mar 5 20:10:26 CET 2008


Thank you, Ken. I'm of course unable to verify any of this, so I'll look 
forward to the feedback from Sarmad Hussain after he's been in contact with 
the local experts.

                  Harald

--On Wednesday, March 05, 2008 09:34:15 -0800 Kenneth Whistler 
<kenw at sybase.com> wrote:

> Harald,
>
> This is the result of a confusion regarding the identity
> of particular characters in the Tibetan script (and a
> variant style of the script used in Bhutan for writing
> the Dzongkha language).
>
> U+0F7F TIBETAN SIGN RNAM BCAD is the Tibetan form of the visarga,
> used in Tibetan transliteration of Sanskrit words. It is
> a combining mark, with alphabetic properties, both in Tibetan,
> and for its correspondents in various Indic scripts:
>
> U+0903 DEVANAGARI SIGN VISARGA
> U+0983 BENGALI SIGN VISARGA
> etc., etc.
>
> The feedback here is based on a *visual* confusion between
> this visarga and a Tibetan delimiter punctuation,
> U+0F14 TIBETAN MARK GTER TSHEG, which *is* a comma-like
> text delimiter. The exact shape of the GTER TSHEG, as
> well as other delimiting punctuation (usually with "SHAD"
> or "TSHEG" in their names) may vary by style and font
> for Tibetan -- and it may well be the case that glyphs
> without the little horizontal bar appear in use for Dzongkha.
>
> In any case this is a different *character* from U+0F7F.
> The PVALID classification of U+0F7F is correct, and the
> use of U+0F14 (or U+0F0D) as a label separator
> for the Tibetan script is perfectly consistent with the
> existing table. Those are already category DISALLOWED.
>
> I think those category determinations are correct and should not
> be changed in the table.
>
> Note, however, that that is distinct from a determination
> that U+0F7F should not be used in Dzongkha domain
> names. I think such a determination is perfectly
> consistent with other decisions to disallow certain PVALID characters
> in their language tables.
>
> The real issue I see here (once the misidentification
> of U+0F7F as delimiter punctuation is cleared up) is
> that Dzongkha (and Tibetan in general) requires the use
> of the TSHEG characters (U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG,
> in particular). Those are mandatory marks that occur
> between syllables, but within words. As such, they are
> functionally similar to U+002D HYPHEN-MINUS, but far
> more ubiquitous in Tibetan script than "-" is in Latin text.
>
> The IDNFeedbackofPANL10nproject.pdf indicates that U+0F0B
> should be allowed in Dzongkha domain names (and the
> situation would be no different for Tibetan in general).
>
> Currently U+0F0B is DISALLOWED. Changing that would require
> an exception added to Section 2.2.2, Category F.
>
> --Ken
>
>> Thank you very much for this wide-ranging input.
>>
>> There are many questions one could ask, but I'll pick one...
>> you say that in Dzongkha, the character U+0F7F, which is TIBETAN SIGN
>> RNAM BCAD, should be regarded as a label separator.
>>
>> This character is of Unicode class Mc (Spacing_Mark), which class
>> includes such signs as the DEVANAGARI VOWEL SIGN AA. In
>> draft-faltstrom-idnabis-tables-05, this is marked as "PVALID", which is
>> of course incompatible with its use as a separator.
>>
>> Do you recommend that TIBETAN SIGN RNAM BCAD be added to the exception
>> list in section 2.2.2 of that draft, with category DISALLOWED?
>>
>> This is a very serious and non-reversible step to take - if we get code
>> into browsers that checks for U+0F7F as a disallowed character, it is
>> very hard to get back to using it as a character in labels if we change
>> our minds later.
>
>






More information about the Idna-update mailing list