Proposal to remove Preferred-Value field for region YU in LTRU

Phillips, Addison addison at amazon.com
Sat Feb 28 17:54:48 CET 2009


Did you mean “the sensible choice is to remove the PV field”? That’s what your email implies, but not what your last sentence says.

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Peter Constable
Sent: Saturday, February 28, 2009 8:50 AM
To: Mark Davis; Tex Texin
Cc: ietf-languages at iana.org; Doug Ewell
Subject: RE: Proposal to remove Preferred-Value field for region YU in LTRU

There are two options involving the two records

A)           YU -> CS
                CS

Or

B)            YU
                CS

Some observations:

(i) We all agree that both options result in paths that are dead ends.

(ii) We all agree that the regions have split into multiple regions, that PV cannot be used to indicate multiple regions, and that LTRU should not change the 4646bis draft to accommodate data indicating a multi-way split.

(iii) We all agree that users wondering how something tagged with YU or CS might be tagged under today’s recommendations, and that (optional) comments might be useful additions to the records for this purpose. And given (ii), comments are the only way to accomplish this.

In light of those observations, option A is not any better than option B for users wondering how Balkan nations have changed and what the implications are for tagging: only comments or user research can answer that, and either can be applied to either option.

However, the change from A to B does have an impact on canonicalization that can change the behaviour of implementations using it. There is no benefit to that behaviour change; it is likely detrimental.

Hence, it seems the sensible choice is not to remove the PV field for YU, but to add comments (not in LTRU process) to the CS and YU records.


Peter

From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Mark Davis
Sent: Friday, February 27, 2009 1:12 PM
To: Tex Texin
Cc: ietf-languages at iana.org; Doug Ewell
Subject: Re: Proposal to remove Preferred-Value field for region YU in LTRU

Ah, I am now finally understanding what you are concerned about. The main problem is that the Preferred value really should be a set, in the case of regions. Then we would have

Before:
YU -> CS

After
YU -> {RS ME}
CS -> {RS ME}

and the connection is maintained. But we -- unfortunately -- don't have that ability, and I'm not suggesting addition at this late date (although perhaps for a future version - in CLDR we maintain that information because it is important for implemenations)!

So removing CS breaks the equivalence class relation between YU and CS.

I'm starting to change my mind about the wisdom of removing the Preferred value. After all the purpose is for canonicalization, and xx-YU and xx-CS should have the same canonical form. We lose that if we drop the value.

Mark
On Fri, Feb 27, 2009 at 12:55, Tex Texin <textexin at xencraft.com<mailto:textexin at xencraft.com>> wrote:
1) I used YU/CS as a shorthand for identifying a subtag that could be either.
2) I understand the inaccuracy between YU and CS. That was not offered as the reason for the change however, at least in the mails I saw. Perhaps it was an implicit motive.

3) I understand that there isn't a requirement to change tags. I'll make the case another way-
At some point in time a user attempts to find documents tagged for Yugoslavia.
The search engine, using the then current registry data noting the preferred value relationship, matches either YU and CS.

Another user searches for documents for Serbia.
The search engine, using the then current registry data noting the preferred value relationship, matches either YU and CS.

The results are in some sense accurate and complete given the history of the subtag.

After the change in the preferred value relationship, the search engine does not search for both, since the registry does not indicate a relationship. Only one or the other subtag is used for each query. However, the query results are now incomplete since documents for YU may have been tagged with the one-time preferred tag of CS.

4) Comments are a good thing for recording rationale and tangential history. However, implementers are not going to go thru and read the comments on any or all tags in order to make a correct implementation. They are going to implement based on the schema and operate with the data values.

5) I think the registry should stay as it is with respect to YU and CS.
As CS is now being used, deprecated or not, I don't see a compelling motivation to change the value back to YU. Doing so would just compound the confusion over the two subtags.

6) I don't expect users to be walking the registry in any event but to use a software package that recommends the optimal value. If that software executes a few extra machine cycles to get to CS, so be it. (And that is only if the results aren't put into a precompiled form.)

7) I would not argue that preferred value relationships should never change. But the motivation to make a change should be compelling enough to outweigh the impact of making ambiguous the existing tagged data.

8) Separate topic- The number of countries in the world seems to grow. This suggests to me that regions being subdivided is not going to be a rare event. Perhaps there should be a mechanism to indicate subtags that have later been split, so instead of one preferred value, there is a way to indicate that a tag has been deprecated in favor of two or more possible values.

tex


-----Original Message-----
From: ietf-languages-bounces at alvestrand.no<mailto:ietf-languages-bounces at alvestrand.no> [mailto:ietf-languages-bounces at alvestrand.no<mailto:ietf-languages-bounces at alvestrand.no>] On Behalf Of Doug Ewell
Sent: Friday, February 27, 2009 5:21 AM
To: ietf-languages at iana.org<mailto:ietf-languages at iana.org>
Subject: Re: Proposal to remove Preferred-Value field for region YU in LTRU
Tex Texin <textexin at xencraft dot com> wrote:

> Historically, it was a concern that codes might change.
> If I use the registry to choose the preferred value for a region, and
> that preferred value can change, then isn't it tantamount to the code
> changing?

This would have been a good question for the LTRU group back when the
decision was made to allow Preferred-Value to change.  I'm guessing this
was about a year ago, but I would have to look it up.

> If I had data that would be represented by YU/CS and after the
> preferred value is removed it should instead be YU, that seems like a
> problem.

I guess I'm not sure what you mean by "YU/CS" in this context.  A
language tag contains at most one region subtag, of course.

> Especially since the relationship between CS and YU becomes lost.

Speaking to this particular case and not to the general principle of
allowing P-V to change...

It has been argued frequently on LTRU that the relationship between CS
and YU is not what it appears, because the country identified as YU
changed its nature dramatically between 1991 and 2003, in a way that was
pertinent to language identification, by shrinking from the original
"Yugoslavia" to just Serbia and Montenegro.  This viewpoint holds that
data tagged as "something-YU" is already ambiguous as to "which YU" is
intended.  This is really just a special case of the problem that
country codes as language modifiers are less than perfectly precise.

> Also, it may not be clear which CS records should be restored to YU.

There is never any presumption that someone will go through and retag
data.  Section 3.1 says, "In particular, the 'Preferred-Value' field
does not imply retagging content that uses the affected subtag."  To me
this implies that a change or deletion of P-V doesn't imply retagging
either.

> I don't see that the fact that the target preferred value of YU is
> also deprecated is a good reason to break the relationship at this
> point. We still end up with deprecated codes with no preferred value
> to go to, so why introduce an unnecessary change?

So that users will not have to follow a chain of arbitrary length to
determine the best subtag -- or in this case, to reach a dead end.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no<mailto:Ietf-languages at alvestrand.no>
http://www.alvestrand.no/mailman/listinfo/ietf-languages

_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no<mailto:Ietf-languages at alvestrand.no>
http://www.alvestrand.no/mailman/listinfo/ietf-languages

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/ietf-languages/attachments/20090228/9c661980/attachment-0001.htm 


More information about the Ietf-languages mailing list