draft-phillips-langtags-08, process, sp ecifications, "stability", and extensions

Mark Davis mark.davis at jtcsv.com
Fri Jan 7 00:47:49 CET 2005


First, I apologize about the statement "there has been a lot of noise on
this issue". By that, I wasn't really meaning your message in particular. I
was commenting more on the general status of a quite a number of statements
that have been made on the overall topic. And by "noise", I really mean
high-level statements without explicit examples or scenarios, where it is
very hard for people not familiar with the details to be able to judge the
correctness of the statements.

And I will assume that it was that perceived insult that caused you to be
dismissive, with your statement below about "Fine, whatever." I assume that
otherwise you would not so readily conclude that it didn't matter whether
RFC 3066 said "if X then Y" vs. "if Y then X". Those are, after all, very
different statements, and a confusion between them would cause incorrect
conclusions to be drawn.

> > (c) Every single tag that could be generated under RFC 3066bis is a tag
that
> > could have been registered under RFC 3066.
>
> True but irrelevant.

Not at all irrelevant. Suppose someone is using a RFC 3066 parser, and is
faced with either:

(a) a registered tag from a future version of the RFC 3066 registry, or
(b) a 3066bis tag (that uses generative features not in RFC 3066).

Their parser will work *exactly* the same way; they would parse both as
being equally well-formed, and they will be unable to determine any of the
structure of either tag, and just treat each as a blob. So they are no
better off, but *no worse off either*. (Had we not followed (c), this would
not have been true.) Of course, if they try parsing a tag that is generated
according to RFC 3066 (eg not in the registry), then they would be able to
parse out the language code and/or country.

If they update to a 3066bis parser, then they can reliably extract much more
information from the tag. And because 3066bis was written to be backwards
compatible, anything RFC 3066 generated language tag parses out exactly the
same as it would with an RFC 3066 parser.

Now you yourself may not care much about the extra information in the
3066bis language tag. But IBM, and many other companies and organizations
do. This is not some theoretically problem; it is a real current issue that
many are faced with. For example, without reliable script information many
languages are severely underspecified. One simply cannot mix content with
different scripts and have happy customers.

And if you don't care about the extra information, you are no worse off than
if you were trying to parse a registered RFC 3066 tag. For matching
purposes, the commonly used truncation mechanism will work just as well with
all 3066bis tags as it does with RFC 3066 tags, for all tags you will
encounter.

‎Mark

----- Original Message ----- 
From: <ned.freed at mrochek.com>
To: "Mark Davis" <mark.davis at jtcsv.com>
Cc: <ned.freed at mrochek.com>; <ietf-languages at alvestrand.no>; <ietf at ietf.org>
Sent: Thursday, January 06, 2005 06:44
Subject: Re: draft-phillips-langtags-08, process, sp
ecifications,"stability",and extensions


> > > Rather, the rule is simply that a country code, if present,
> > > always appears as a two letter second subtag. The new draft changes
this
> > rule,
> > > so applications that pay attention to coutnry codes in language tags
have
> > to
> > > change and the new algorithm for finding the country code is trickier.
>
> > Your text above says (a) "if there is a country code in the tag, it is
the
> > second subtag". That is not what text of RFC 3066 actually says, which
is:
>
> > > The following rules apply to the second subtag:
> > > All 2-letter subtags are interpreted as ISO 3166 alpha-2 country...
>
> > That is, it says (b) "if a second subtag has 2 letters, then it is an
ISO
> > 3166 code", which is not the same as (a). (It is almost, but not quite,
the
> > converse.)
>
> Fine, whatever.
>
> > The current RFC certainly does not forbid the use of country
> > codes in other positions in language tags. One could absolutely register
> > en-Latin-US, for example, meaning English as spoken in the US written in
> > Latin script.
>
> Sure, but my point was, is, and always has been that any 3066-compliant
> implementation won't see this as a country code (unless it is table
driven,
> which brings up its own set of issues).
>
> > There has been a lot of noise on this issue, and too few concrete
examples.
>
> No, what there has been is a lot of discussion of a real problem with no
> apparent recognition of it as such by the draft authors. Your pejorative
> characterization of this as "noise" does not make it so.
>
> > In the so-called 3066bis draft, we have striven very hard to ensure
that:
>
> > (c) Every single tag that could be generated under RFC 3066bis is a tag
that
> > could have been registered under RFC 3066.
>
> True but irrelevant.
>
> > Thus if someone wrote a parser that is future-compatible -- that could
parse
> > all RFC 3066 language tags including those registered after the parser
was
> > deployed -- then that parser can handle all 3066bis language tags. This
is a
> > significant advance over RFC 3066, whose registered (not generated)
language
> > tags are atomic, and cannot be effectively parsed at all. 3066bis adds
more
> > structure so as to allow effective parsing of tags.
>
> > If you *can* come up with tags that would show that (c) is invalid, that
> > would be a concrete case that we would have to make adjustments in the
draft
> > for.
>
> (c) is frankly not an issue I care one whit about. (Perhaps I should, but
I
> don't.) I don't register tags. I write code that processes, and more to
the
> point matches, tags. That's why I have issues with this draft.
>
> > Moreover, all the talk about this being *too* complex is far overblown.
>
> Again, your pejorative dismissal of other people's concerns does not
> mean your position is valid.
>
> > All
> > 3066bis language tags can be parsed, including all the grandfathered
codes,
> > with a very short piece of code, or even with a regular expression (such
as
> > in Perl).
>
> Of course you can write a short piece of code to parse this stuff. It's
what you
> do with it after you parse it that's a problem.
>
> > This is not rocket science.
>
> Parsing almost never is. But simply parsing these tag is not, and never
has
> been, the issue.
>
> Ned
>



More information about the Ietf-languages mailing list