Unifon script?

Sat Sep 28 21:12:22 CEST 2013

CE Whitehead <cewcathar at hotmail dot com> wrote:

> I was asking was whether or not it would be wise or even necesasry to
> also use the prefixes [en-Latn] , etc., and whether doing so would in
> any way affect matching (given that the default script for these
> languages is in all cases [Latn] anyway)?

Matching is inexact, because RFC 4647 specifies multiple matching 
algorithms, and implementations may choose to use one of them or 
something else.

It can certainly be seen that, for example, "en-unifon" would not match 
"en-Latn-unifon" according to the Basic Filtering strategy (RFC 4647, 
Section 3.3.1) because neither is an exact left-to-right subset of the 
other.

Elsewhere on RFC 4647 (Section 4.1), it is stated that language ranges 
should avoid unnecessary subtags, such as a script subtag that is the 
Suppress-Script for the given language subtag, exactly because they can 
cause matching problems. The section specifically calls out "en-Latn" as 
an example.

For some time, the Registry did not include variant subtags with Prefix 
fields that contained script subtags. RFC 5646 does say that other 
variant subtags could be included in the Prefix if it is important to 
maintain the hierarchy. For example, 'biske' has a Prefix of "sl-rozaj" 
but not "sl" because the submitter considered it important for the 
subtag to indicate explicitly that 'biske' is a sub-dialect of 'rozaj', 
which in turn is a dialect of 'sl'.

When variants were added for Pinyin and Wade-Giles in 2008, and for the 
Hepburn romanization of Japanese in 2009 and 2010, the Prefix fields did 
include the script subtag 'Latn'. But notice that for these languages, 
'Latn' is specifically not the default script for that language. It is 
there to point out that Wade-Giles, say, is not simply a variant of 
Chinese and Tibetan, but of romanized Chinese and romanized Tibetan.

I don't believe we want to go down the path of including additional 
subtags in the Prefix field of variants, any more than absolutely 
necessary. I agree with John (and the wording of RFC 4646) that Prefix 
fields are meant to indicate what combinations are "recommended" or 
"make sense," not to impose tight constraints.

Peter Constable <petercon at microsoft dot com> wrote:

> This brings up an issue that seems at least as interesting as the
> language prefix issue: this variant implies (or should imply) Latn
> script, but we don't have a good mechanism to capture that. This isn't
> a new scenario, though — the same would apply to the fonipa variant
> subtag, for instance. A prefix field, per the current rules, must
> contain a valid tag; it can't have a language range like "*-Latn".
>
> Related, I don't recall if we ever discussed a potential interaction
> with suppress script: if the variant implies a particular script,
> then a script subtag adds no additional information — e.g.
> "en-Latin-unifon" would be no more informative than "en-unifon".
> That's the kind of situation for which suppress-script was added. But
> adding a suppress-script field to a variant entry is not currently
> allowed.

If a variant such as 'fonipa' or 'unifon' implies that the script is 
Latin, is it necessary for the language tag to call that out? Even in 
the absence of a region or script subtag, a language-only tag like "ku" 
might be just fine for its purpose.

Suppress-Script was intended to ease compatibility with RFC 3066 
applications that think "language plus country equals tag." They won't 
understand new variants anyway; only a few will even understand the 
precomposed variants from the RFC 3066 registry. Adding Suppress-Script 
to variants seems to be a solution to an imagined problem.

Mark Davis ☺ <mark at macchiato dot com> wrote:

> We could use "und-Latn", since 'und' is what we use where the language
> tag would be unspecified.

This wouldn't work in the general case, since as John said, 'und' 
doesn't mean "star." It might work for CLDR.

--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell