draft-phillips-langtags-08 script subtags and matching

Tex Texin tex at xencraft.com
Sun Jan 2 03:30:49 CET 2005


Although the format of the tags is consistent, and the matching rules
unchanged, the behavior that users will see is indeed different.

As a user that specifies tags for languages that are acceptable, I do not know
how you tag your contents, and in particular whether you tag using 3066 or
3066bis. A page may be tagged as any of sr, sr-CS, sr-Latn, or sr-Latn-CS.

Under RFC 3066, if I specify sr-CS I will have returned an sr-CS page if it
Under RFC 3066bis, if I continue to specify sr-CS, not knowing that the server
has begun using sr-Latn-CS, I will not have that page returned.

Because of this, taggers will be discouraged from using the 3066bis scripted
tags. Their users will not get correct pages until a) most browsers support the
new format and b) users specify scripts in tags. 

Given the length of time this will take to propagate thru the industry, there
is a strong disincentive to using rfc 3066bis scripted tags.

If upward compatibility is a goal (and it should be) then script subtags should
come after the country subtag.
i.e. sr-CS-Latn.

Existing users declaring they can accept sr-CS will continue to get the same
pages, even as the pages are upgraded to the new format. This is because sr-CS
matches the first two subtags of sr-CS-Latn.

We would then have compatibility with the tags that can be generated under
3066bis, but not for the few tags already registered with script as a secondary
tag. Since all tags in the registry are already treated as a special case by
virtue of their being in the registry, I don't see that as a problem.

Given the prevalence of language-country format tags, it does not make sense to
insert script in the middle.
I do understand that script is more closely related to language and therefore
having country in the middle seems to be an incorrect prioritization, but given
the legacy, the script subtag should be appended and the esthetics abandoned.


"Addison Phillips [wM]" wrote:
> Bruce wrote:
> ---
> No, you seem to have missed the point; there exist RFC 3066
> implementations. Such implementations, using the RFC 3066 rules,
> could match something like "sr-CS-Latn" to "sr-CS", but could
> not match "sr-Latn-CS" to "sr-CS".  By changing the definition of
> the interpretation of the second subtag, the proposed draft fails
> to be compatible with existing deployed implementations (which is
> what is meant by "backwards compatibility", which is a prime
> consideration for Internet protocols).
> ---
> No, your argument is flawed and wrong.
> The draft does not change the "interpretation of the second subtag". The second subtag was never defined to be simply region subtags--although they sometimes are.
> I quote the definition from RFC 3066:
> ---
>    The following rules apply to the second subtag:
>    - All 2-letter subtags are interpreted as ISO 3166 alpha-2 country
>      codes from [ISO 3166], or subsequently assigned by the ISO 3166
>      maintenance agency or governing standardization bodies, denoting
>      the area to which this language variant relates.
>    - Tags with second subtags of 3 to 8 letters may be registered with
>      IANA, according to the rules in chapter 5 of this document.
>    - Tags with 1-letter second subtags may not be assigned except after
>      revision of this standard.
>    There are no rules apart from the syntactic ones for the third and
>    subsequent subtags.
> ---
> The second subtag *could* be anything, but tags created under the generative mechanism defined two letter subtags following the primary language subtag to be region subtags based on ISO 3166. This doesn't change with the draft: two-letter subtags are still region tags from ISO 3166. We merely define four letter subtags to be the script subtag also and prescribe an order that the subtags must follow. This doesn't break ANY existing implementations, because while iIt is the case that "sr-Latn-CS" is not matched to "sr-CS" in existing implementations, neither is it matched by those based on the draft.
> The draft does define some new sources and an order for subtags that existing implementations will not recognize, but this hardly breaks anything. Matching hasn't changed, so existing implementations won't be hurt by the insertion of script subtags between the two subtags (unless the matching was not compliant with RFC 3066 in the first place).
> Regards,
> Addison
> Addison P. Phillips
> Director, Globalization Architecture
> http://www.webMethods.com
> Chair, W3C Internationalization Working Group
> http://www.w3.org/International
> Internationalization is an architecture.
> It is not a feature.
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages

Tex Texin   cell: +1 781 789 1898   mailto:Tex at XenCraft.com
Xen Master                          http://www.i18nGuy.com
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World

More information about the Ietf-languages mailing list