Proposal to remove Preferred-Value field for region YU in LTRU

Tex Texin textexin at xencraft.com
Sat Feb 28 06:39:53 CET 2009


Hi Doug,

I think we are becoming repetitive, so I will be brief.

I do understand the logic that the CS tag is no better a representation than YU and so why drag the user through an extra step.
My point is that removing the extra step is a minor optimization but losing the fact that the two were once related by preferred values is a significant semantic change and it should now be preserved.

The fact that other tags are also being optimized is perhaps also bothersome, but less so, since if they collapse to the same value then presumably the relationship between them all can be found. It is not the same as severing a relationship.

I don't see a need for optimizing the registry itself in this way. It is easy enough to leave the registry as-is and let a layer above the registry provide optimizations for callers/users if that is desirable.

I don't understand the comments on a couple mails like the last para in #5. A year ago it was decided that preferred values could be changed. Fine. On 2/21/2009 the question was raised on this list about YU's preferred value and the proposed change. I am responding to that question. If it isn't open to debate, then I don't understand what was being asked.

tex

-----Original Message-----
From: ietf-languages-bounces at alvestrand.no [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of Doug Ewell
Sent: Friday, February 27, 2009 8:07 PM
To: ietf-languages at iana.org
Subject: Re: Proposal to remove Preferred-Value field for region YU in LTRU

Tex Texin <textexin at xencraft dot com> wrote:

> 1) I used YU/CS as a shorthand for identifying a subtag that could be 
> either.

OK, I understand the notation now.

> 2) I understand the inaccuracy between YU and CS. That was not offered 
> as the reason for the change however, at least in the mails I saw. 
> Perhaps it was an implicit motive.

The whole idea of having a deprecated subtag with no Preferred-Value, 
such as CS is now, is that there is no one value that serves as a 
suitable replacement.  CS has no Preferred-Value because neither RS nor 
ME is a suitable replacement by itself; you don't know that "sr-CS" 
should necessarily be matched with "sr-RS".  Perhaps the content in 
question originated in Montenegro, and perhaps the distinction matters.

The reason I offered for removing YU's Preferred-Value of CS is that it 
buys the user nothing from the standpoint of finding a suitable 
replacement.  Right now the user sees that YU is deprecated, but hey, at 
least there's a Preferred-Value of CS.  Then she looks up CS and sees 
that it too is deprecated, but with no Preferred-Value.  The user is no 
closer to finding a non-deprecated match for YU than if she had not been 
led to CS at all.

> At some point in time a user attempts to find documents tagged for 
> Yugoslavia.
> The search engine, using the then current registry data noting the 
> preferred value relationship, matches either YU and CS.

OK, so far so good.

> Another user searches for documents for Serbia.
> The search engine, using the then current registry data noting the 
> preferred value relationship, matches either YU and CS.

I assume you mean Serbia and Montenegro (CS), because Serbia (RS) has no 
P-V and is not the P-V for anything else.

I would be surprised if any search engines will be built that search 
*backwards* along the Preferred-Value path, matching subtag X with all 
deprecated subtags that have X as their Preferred-Value.

> The results are in some sense accurate and complete given the history 
> of the subtag.

OK, we have matched one deprecated subtag with another deprecated 
subtag.  I can see that there might be some value in that.  I don't know 
if that would be a mainstream use of language tags.

> After the change in the preferred value relationship, the search 
> engine does not search for both, since the registry does not indicate 
> a relationship. Only one or the other subtag is used for each query. 
> However, the query results are now incomplete since documents for YU 
> may have been tagged with the one-time preferred tag of CS.

Agreed, assuming once again that search engines really would have 
searched the P-V chain in both directions.

> 4) Comments are a good thing for recording rationale and tangential 
> history. However, implementers are not going to go thru and read the 
> comments on any or all tags in order to make a correct implementation. 
> They are going to implement based on the schema and operate with the 
> data values.

Basically correct.  Comments are a good thing for giving human readers 
of the Registry a snippet of background to help explain the subtag. 
They are not by any means intended to be machine-readable.

> 5) I think the registry should stay as it is with respect to YU and 
> CS.
> As CS is now being used, deprecated or not, I don't see a compelling 
> motivation to change the value back to YU. Doing so would just 
> compound the confusion over the two subtags.

This almost sounds like my argument on the Unicode list a few weeks ago, 
that the Unicode Consortium should not continue to recommend that people 
tag Hebrew content as 'iw', Yiddish content as 'ji', etc., using code 
elements withdrawn from ISO 639 twenty years ago.

In any case, nobody is changing any values back to YU.  The proposal is 
to do with YU in the RFC 4646bis era essentially what is being done with 
all other Preferred-Value chains, namely, to give each deprecated subtag 
or tag along the chain the same Preferred-Value -- the last one -- and 
not a "preferred" value that is itself deprecated.

Right now "i-hak" points to "zh-hakka".  When RFC 4646bis goes live, 
instead of having "i-hak" point to "zh-hakka" and "zh-hakka" point to 
"hak", "i-hak" will point directly to "hak".  In the case of YU, rather 
than keep the deprecated subtag CS as the "preferred" value, the 
proposal is to tell the truth, which is that no value is really 
preferred.

We really did talk about this in LTRU, which is where policy decisions 
like this are made.  This list helps the Reviewer to implement what the 
RFC says.  It's not an open issue for debate; we debated it a year ago, 
and have spent about 3 years discussing draft-4646bis and producing a 
score of drafts.

> 6) I don't expect users to be walking the registry in any event but to 
> use a software package that recommends the optimal value. If that 
> software executes a few extra machine cycles to get to CS, so be it. 
> (And that is only if the results aren't put into a precompiled form.)

I agree about the few extra cycles.  But it does seem that a lot of 
people are interested in using their eyes to read the Registry, more 
than I would have expected.

> 7) I would not argue that preferred value relationships should never 
> change. But the motivation to make a change should be compelling 
> enough to outweigh the impact of making ambiguous the existing tagged 
> data.

I don't see what is ambiguous, or what will become ambiguous, or unclear 
in any way.  YU will continue to appear in the Registry with the name 
"Yugoslavia," whatever that means.  CS will continue to appear in the 
Registry with the name "Serbia and Montenegro."  Equating Yugoslavia 
with Serbia and Montenegro is what some people have considered 
ambiguous.

> 8) Separate topic- The number of countries in the world seems to grow. 
> This suggests to me that regions being subdivided is not going to be a 
> rare event. Perhaps there should be a mechanism to indicate subtags 
> that have later been split, so instead of one preferred value, there 
> is a way to indicate that a tag has been deprecated in favor of two or 
> more possible values.

I simply don't understand how that benefits someone who is trying to 
match or correlate language tags.  I understand how it sheds light on 
geopolitical history, but as fascinated as I am with geopolitical 
history, it's not what region subtags are all about.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

_______________________________________________
Ietf-languages mailing list
Ietf-languages at alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages



More information about the Ietf-languages mailing list