Proposal to remove Preferred-Value field for region YU in LTRU

Phillips, Addison addison at amazon.com
Fri Feb 27 22:42:57 CET 2009


> At some point in time a user attempts to find documents tagged for
> Yugoslavia.
> The search engine, using the then current registry data noting the
> preferred value relationship, matches either YU and CS.

YU could also mean BA, SI, MK, RS, ME, HR in addition to CS. And probably we'll get a code for Kosovo at some point too.

Let's go further. Slovenian, Macedonian, and Albanian take care of themselves. But "Serbo-Croatian" (sh) encompasses 'hr', 'sr', and 'bs'. It might could wrap around a "Montenegrin" language at some point too. Throw in the Cyrillic vs. Latin script and you have a lot of potential tags for the former "sh-YU". And there really *are* meaning differences in that tag cloud. Isn't that rather much to expect from Preferred-Value?

> 5) I think the registry should stay as it is with respect to YU and
> CS.
> As CS is now being used, deprecated or not, I don't see a
> compelling motivation to change the value back to YU. Doing so
> would just compound the confusion over the two subtags.

No one is expecting a "switch back", I don't believe. YU is still deprecated. Do you *really* believe anyone is going to start to tag stuff with Yugoslavia? I don't actually believe there is any confusion about the subtags here. It's merely about how to record them all in the registry. Frankly, you should go read the LTRU archive where this was discussed at year ago, because a lot of these cases were discussed there.

> 7) I would not argue that preferred value relationships should
> never change. But the motivation to make a change should be
> compelling enough to outweigh the impact of making ambiguous the
> existing tagged data.

Why does removing a misleading P-V make the data "ambiguous"? 

Actually, I would argue that keeping 'CS' the P-V for Yugoslavia is *more* misleading than fixing the situation. Adding all of the devolved Balkan countries to 'YU' really doesn't help matters. Ultimately, any automagic process that handles language tags needs a lot more information about language relationships than the registry or language tags themselves can really convey on their own.

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Tex Texin
> Sent: Friday, February 27, 2009 12:55 PM
> To: 'Doug Ewell'; ietf-languages at iana.org
> Subject: RE: Proposal to remove Preferred-Value field for region YU
> in LTRU
> 
> 1) I used YU/CS as a shorthand for identifying a subtag that could
> be either.
> 2) I understand the inaccuracy between YU and CS. That was not
> offered as the reason for the change however, at least in the mails
> I saw. Perhaps it was an implicit motive.
> 
> 3) I understand that there isn't a requirement to change tags. I'll
> make the case another way-
> At some point in time a user attempts to find documents tagged for
> Yugoslavia.
> The search engine, using the then current registry data noting the
> preferred value relationship, matches either YU and CS.
> 
> Another user searches for documents for Serbia.
> The search engine, using the then current registry data noting the
> preferred value relationship, matches either YU and CS.
> 
> The results are in some sense accurate and complete given the
> history of the subtag.
> 
> After the change in the preferred value relationship, the search
> engine does not search for both, since the registry does not
> indicate a relationship. Only one or the other subtag is used for
> each query. However, the query results are now incomplete since
> documents for YU may have been tagged with the one-time preferred
> tag of CS.
> 
> 4) Comments are a good thing for recording rationale and tangential
> history. However, implementers are not going to go thru and read
> the comments on any or all tags in order to make a correct
> implementation. They are going to implement based on the schema and
> operate with the data values.
> 
> 5) I think the registry should stay as it is with respect to YU and
> CS.
> As CS is now being used, deprecated or not, I don't see a
> compelling motivation to change the value back to YU. Doing so
> would just compound the confusion over the two subtags.
> 
> 6) I don't expect users to be walking the registry in any event but
> to use a software package that recommends the optimal value. If
> that software executes a few extra machine cycles to get to CS, so
> be it. (And that is only if the results aren't put into a
> precompiled form.)
> 
> 7) I would not argue that preferred value relationships should
> never change. But the motivation to make a change should be
> compelling enough to outweigh the impact of making ambiguous the
> existing tagged data.
> 
> 8) Separate topic- The number of countries in the world seems to
> grow. This suggests to me that regions being subdivided is not
> going to be a rare event. Perhaps there should be a mechanism to
> indicate subtags that have later been split, so instead of one
> preferred value, there is a way to indicate that a tag has been
> deprecated in favor of two or more possible values.
> 
> tex
> 
> 
> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-
> bounces at alvestrand.no] On Behalf Of Doug Ewell
> Sent: Friday, February 27, 2009 5:21 AM
> To: ietf-languages at iana.org
> Subject: Re: Proposal to remove Preferred-Value field for region YU
> in LTRU
> 
> Tex Texin <textexin at xencraft dot com> wrote:
> 
> > Historically, it was a concern that codes might change.
> > If I use the registry to choose the preferred value for a region,
> and
> > that preferred value can change, then isn't it tantamount to the
> code
> > changing?
> 
> This would have been a good question for the LTRU group back when
> the
> decision was made to allow Preferred-Value to change.  I'm guessing
> this
> was about a year ago, but I would have to look it up.
> 
> > If I had data that would be represented by YU/CS and after the
> > preferred value is removed it should instead be YU, that seems
> like a
> > problem.
> 
> I guess I'm not sure what you mean by "YU/CS" in this context.  A
> language tag contains at most one region subtag, of course.
> 
> > Especially since the relationship between CS and YU becomes lost.
> 
> Speaking to this particular case and not to the general principle
> of
> allowing P-V to change...
> 
> It has been argued frequently on LTRU that the relationship between
> CS
> and YU is not what it appears, because the country identified as YU
> changed its nature dramatically between 1991 and 2003, in a way
> that was
> pertinent to language identification, by shrinking from the
> original
> "Yugoslavia" to just Serbia and Montenegro.  This viewpoint holds
> that
> data tagged as "something-YU" is already ambiguous as to "which YU"
> is
> intended.  This is really just a special case of the problem that
> country codes as language modifiers are less than perfectly precise.
> 
> > Also, it may not be clear which CS records should be restored to
> YU.
> 
> There is never any presumption that someone will go through and
> retag
> data.  Section 3.1 says, "In particular, the 'Preferred-Value'
> field
> does not imply retagging content that uses the affected subtag."
> To me
> this implies that a change or deletion of P-V doesn't imply
> retagging
> either.
> 
> > I don't see that the fact that the target preferred value of YU
> is
> > also deprecated is a good reason to break the relationship at
> this
> > point. We still end up with deprecated codes with no preferred
> value
> > to go to, so why introduce an unnecessary change?
> 
> So that users will not have to follow a chain of arbitrary length
> to
> determine the best subtag -- or in this case, to reach a dead end.
> 
> --
> Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
> http://www.ewellic.org
> http://www1.ietf.org/html.charters/ltru-charter.html
> http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
> 
> _______________________________________________
> Ietf-languages mailing list
> Ietf-languages at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/ietf-languages


More information about the Ietf-languages mailing list