Unifon script?
Doug Ewell
doug at ewellic.org
Sat Sep 28 21:12:22 CEST 2013
CE Whitehead <cewcathar at hotmail dot com> wrote:
> I was asking was whether or not it would be wise or even necesasry to
> also use the prefixes [en-Latn] , etc., and whether doing so would in
> any way affect matching (given that the default script for these
> languages is in all cases [Latn] anyway)?
Matching is inexact, because RFC 4647 specifies multiple matching
algorithms, and implementations may choose to use one of them or
something else.
It can certainly be seen that, for example, "en-unifon" would not match
"en-Latn-unifon" according to the Basic Filtering strategy (RFC 4647,
Section 3.3.1) because neither is an exact left-to-right subset of the
other.
Elsewhere on RFC 4647 (Section 4.1), it is stated that language ranges
should avoid unnecessary subtags, such as a script subtag that is the
Suppress-Script for the given language subtag, exactly because they can
cause matching problems. The section specifically calls out "en-Latn" as
an example.
For some time, the Registry did not include variant subtags with Prefix
fields that contained script subtags. RFC 5646 does say that other
variant subtags could be included in the Prefix if it is important to
maintain the hierarchy. For example, 'biske' has a Prefix of "sl-rozaj"
but not "sl" because the submitter considered it important for the
subtag to indicate explicitly that 'biske' is a sub-dialect of 'rozaj',
which in turn is a dialect of 'sl'.
When variants were added for Pinyin and Wade-Giles in 2008, and for the
Hepburn romanization of Japanese in 2009 and 2010, the Prefix fields did
include the script subtag 'Latn'. But notice that for these languages,
'Latn' is specifically not the default script for that language. It is
there to point out that Wade-Giles, say, is not simply a variant of
Chinese and Tibetan, but of romanized Chinese and romanized Tibetan.
I don't believe we want to go down the path of including additional
subtags in the Prefix field of variants, any more than absolutely
necessary. I agree with John (and the wording of RFC 4646) that Prefix
fields are meant to indicate what combinations are "recommended" or
"make sense," not to impose tight constraints.
Peter Constable <petercon at microsoft dot com> wrote:
> This brings up an issue that seems at least as interesting as the
> language prefix issue: this variant implies (or should imply) Latn
> script, but we don't have a good mechanism to capture that. This isn't
> a new scenario, though — the same would apply to the fonipa variant
> subtag, for instance. A prefix field, per the current rules, must
> contain a valid tag; it can't have a language range like "*-Latn".
>
> Related, I don't recall if we ever discussed a potential interaction
> with suppress script: if the variant implies a particular script,
> then a script subtag adds no additional information — e.g.
> "en-Latin-unifon" would be no more informative than "en-unifon".
> That's the kind of situation for which suppress-script was added. But
> adding a suppress-script field to a variant entry is not currently
> allowed.
If a variant such as 'fonipa' or 'unifon' implies that the script is
Latin, is it necessary for the language tag to call that out? Even in
the absence of a region or script subtag, a language-only tag like "ku"
might be just fine for its purpose.
Suppress-Script was intended to ease compatibility with RFC 3066
applications that think "language plus country equals tag." They won't
understand new variants anyway; only a few will even understand the
precomposed variants from the RFC 3066 registry. Adding Suppress-Script
to variants seems to be a solution to an imagined problem.
Mark Davis ☺ <mark at macchiato dot com> wrote:
> We could use "und-Latn", since 'und' is what we use where the language
> tag would be unspecified.
This wouldn't work in the general case, since as John said, 'und'
doesn't mean "star." It might work for CLDR.
--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell
More information about the Ietf-languages
mailing list