Request to register private-use variant subtags

Gordon P. Hemsley gphemsley at gmail.com
Sat Apr 7 23:23:37 CEST 2012


On Sat, Apr 7, 2012 at 4:49 PM, Doug Ewell <doug at ewellic.org> wrote:
> Gordon P. Hemsley wrote:
>
>> An important part of what I'm doing involves a step once-removed
>> between the Registry and the display of the names.
>
> OK, so this is a separate layer that you are adding. This will prove
> important to the discussion.
>
>> As you well know, many entries in the Registry contain multiple values
>> for the "Description" field. This may because things have different
>> names or whatever.
>
> It is exactly for that reason. For example, "Spanish" and "Castilian" are
> the same language according to ISO 639-3.
>
>> So the first step of what I'm doing involves
>> deciding how to translate Descriptions into Names. (They are not the
>> same thing—the registry has no concept of Name.)
>
> To the extent that "names" are a different concept from "descriptions," the
> Registry doesn't encode "names" for such subtags because it does not attempt
> to encode or register their semantics. 'es' represents the language (Type
> field) that people generally refer to in English as "Spanish" or "Castilian"
> (Description fields). What that actually means, say, in terms of "how does
> this language differ from others," is up to the user of BCP 47, that is, the
> producer or consumer of a language tag that includes 'es'.
>

I'm aware of all this. I was already assuming it as background to the
discussion.

>> In this process, Private Use subtags are handled specially. Since, as
>> far as I can tell, such subtags have special semantics—in particular,
>> that the Registry has no knowledge of their meaning—they do not
>> receive a Name.
>
> The Registry has no knowledge of the "meaning" of any subtag. The
> denotations—not just Description fields—of language, script, and region
> subtags are defined by ISO 639, 15924, and 3166 respectively, to the extent
> that those standards attempt to encode concepts instead of names. (Not all
> do.) The distinction between private-use and other subtags is one that you
> are creating. There is no difference in the Registry between the entries:
>
> Type: region
> Subtag: AA
> Description: Private use
> Added: 2005-10-16
>
> and:
>
> Type: region
> Subtag: AC
> Description: Ascension Island
> Added: 2009-07-29
>
> except that the former includes a Description field that contains the word
> "Private" while the latter does not. In fact, once you have started
> isolating subtags with "Private" from the others, there is really nothing to
> stop you from declaring script subtags 'Zxxx' and 'Zyyy' and 'Zzzz'
> exceptional as well.

And I have, in fact, done just that, along with 'Zinh', as well.

>> In fact, they are excluded from my database as any
>> "unregistered" subtag would be. As such, both a private-use subtag and
>> an unregistered would be displayed literally in my system. Only
>> subtags with corresponding Names in my database are processed before
>> being displayed.
>
> That is a distinction you have created, as seen above. And it causes other
> problems, as seen below.
>
>> The problem comes when I want to test that the code is doing what I
>> described above. In order to ensure that I get a clear and permanent
>> separation between a subtag that gets a display name and a subtag that
>> gets output literally, I use private use subtags—they are permanently
>> reserved, so I don't run the risk of them accidentally getting an
>> associated Name in the future.
>
> You could perform this test against all five types of registered subtag
> (language, extlang, script, region, variant) by randomly generating a subtag
> value and checking the Registry, or your database, to ensure that the value
> isn't already registered, and iterating as necessary. (That's how random,
> guaranteed-nonexistent filenames are generated.) There's no need to have a
> predefined value that expressly means "nothing." That isn't what private-use
> subtags are for anyway.

The tests are run asynchronously from the generation of the lists of
names (as discussed further below), so checking the Registry for
validity is not an option. In addition, the tests are being run to
ensure that the correct name is associated with the given language
tag. Checking the list for validity would create add circular logic to
the test and render it moot.

As for what a private-use subtag means, you say below that it means
"there is assumed to be a private agreement under which this subtag
has a certain meaning", so I am perfectly entitled to make a private
agreement that says it means "nothing". As such, I can use private-use
subtags for precisely the purpose I'm using them.

>> And this system works fine for language, region, and script subtags.
>> It falls apart when it comes to variant subtags. There is no clear
>> separation between a subtag that should have a display name and which
>> should get output literally—they all potentially fall into the former
>> category.
>
> But claiming that private-use subtags should not have a "display name" while
> others should is your concept, not a BCP 47 concept.

I'm aware of that. This discussion is not supposed to be about how I
decide what gets a "display name" and what doesn't. This is a
discussion about whether there should be a variant subtag that has the
equivalent purpose as a private-use language, region, or script
subtag. Anything about what that private use actually is is outside
the scope of this discussion, I think.

>> A subtag that "probably won't be registered" is not a
>> concrete enough definition. Without formally and permanently reserving
>> some variant subtags for private use, there will always be the
>> possibility of any given subtag of being registered as a real variant.
>> (The word of those currently involved that they'd never approve such a
>> subtag is not good enough. Things change.)
>
> Just check dynamically to make sure the test subtag is still not registered.

It's not that simple. As I mentioned before, the list that is used is
not tied directly to the Registry. Aside from being self-defeating,
checking the Registry for validity would require a much more
complicated architecture than is currently in place.

>> The only way to guarantee that a variant "will almost certainly never
>> be registered" is to permanently and officially reserve it as such.
>
> Nothing in the Registry, private-use subtags or anything else, is reserved
> or registered to mean nothing.

As we've previously established, there is a step in between the
Registry and what I am displaying. In that intermediate step, I
translate "private use" to "has no meaning"—something I am completely
entitled to do.

> I'm a developer who has written BCP 47 applications, and currently working
> on a new one (though at present I have essentially zero time to devote to
> it). In the past I pre-processed the Registry, reformatting some things and
> adding my own assumptions, and it turned out that not only did that add a
> noticeable burden for me every time the Registry was updated, but it didn't
> even work because not everyone shares my assumptions. It turned out to be
> better to accept the Registry verbatim, and draw a clear and careful
> distinction between what it says and any additional knowledge or assumptions
> I might add.

Well, perhaps your BCP 47 applications did not have a widely
international and multicultural audience, but accepting the contents
of the Registry verbatim raises a number of issues, not the least of
which being political. The prime example in my mind (though there are
several) is the description of the region subtag 'TW' being listed as
"Taiwan, Province of China". It is listed this way in the Registry, I
believe, because that is the value direct from ISO, which is affected
by governmental politics. However, Mozilla (and most other Internet
uses, I'm sure) has to answer directly to the people—the developers
and users—first, and those in Taiwan object to the Description listed
in the Registry.

(Just to be clear: I do not in any way represent or speak for Mozilla.
This is just the situation as I've come to know it as a member of the
Mozilla community.)

There are other problems with accepting the Registry verbatim, as
well. For example, the value the Description field is often much more
verbose than it needs to be. In particular, it would be undesirable to
accept name a like "Islamic Republic of Iran" when "Iran" would do
just fine. Space is at a premium in a user interface.

All of this processing as to be done when converting from the Registry
to the software, and it would not make much sense to do it on the fly
every single time the information was needed.

> I believe that's the boat you're in now. Private-use does not mean "this
> subtag has no meaning"; it means "there is assumed to be a private agreement
> under which this subtag has a certain meaning." Thinking of private-use as
> "no meaning" is an assumption you are adding.

It's not an assumption. It's a private agreement.

> Additionally, by treating private-use subtags the same as unregistered ones,
> your assumption fails validity testing. A validating processor should accept
> the tag "qaa-Qaaa-QZ" as valid, and should reject the tag "eaa-Eaaa-EZ" as
> invalid because none of its subtags is registered. Your process would treat
> both tags as invalid, which is contrary to BCP 47.

I'm not directly checking validity, so I don't see how this is relevant.

> For the reasons I've stated, and speaking partly as a programmer and partly
> as a BCP 47 Designated Expert, I think it would be a mistake to register a
> subtag of any type for the purpose "this subtag has no meaning" simply to
> solve a programming problem.

It would be a subtag for the purpose of "Private use", of which there
are several already. My particular private agreement need not be
formalized in the Registry.

Gordon

-- 
Gordon P. Hemsley
me at gphemsley.org
http://gphemsley.org/http://gphemsley.org/blog/


More information about the Ietf-languages mailing list