Request to register private-use variant subtags
doug at ewellic.org
Sat Apr 7 22:49:09 CEST 2012
Gordon P. Hemsley wrote:
> An important part of what I'm doing involves a step once-removed
> between the Registry and the display of the names.
OK, so this is a separate layer that you are adding. This will prove
important to the discussion.
> As you well know, many entries in the Registry contain multiple values
> for the "Description" field. This may because things have different
> names or whatever.
It is exactly for that reason. For example, "Spanish" and "Castilian"
are the same language according to ISO 639-3.
> So the first step of what I'm doing involves
> deciding how to translate Descriptions into Names. (They are not the
> same thing—the registry has no concept of Name.)
To the extent that "names" are a different concept from "descriptions,"
the Registry doesn't encode "names" for such subtags because it does not
attempt to encode or register their semantics. 'es' represents the
language (Type field) that people generally refer to in English as
"Spanish" or "Castilian" (Description fields). What that actually means,
say, in terms of "how does this language differ from others," is up to
the user of BCP 47, that is, the producer or consumer of a language tag
that includes 'es'.
> In this process, Private Use subtags are handled specially. Since, as
> far as I can tell, such subtags have special semantics—in particular,
> that the Registry has no knowledge of their meaning—they do not
> receive a Name.
The Registry has no knowledge of the "meaning" of any subtag. The
denotations—not just Description fields—of language, script, and region
subtags are defined by ISO 639, 15924, and 3166 respectively, to the
extent that those standards attempt to encode concepts instead of names.
(Not all do.) The distinction between private-use and other subtags is
one that you are creating. There is no difference in the Registry
between the entries:
Description: Private use
Description: Ascension Island
except that the former includes a Description field that contains the
word "Private" while the latter does not. In fact, once you have started
isolating subtags with "Private" from the others, there is really
nothing to stop you from declaring script subtags 'Zxxx' and 'Zyyy' and
'Zzzz' exceptional as well.
> In fact, they are excluded from my database as any
> "unregistered" subtag would be. As such, both a private-use subtag and
> an unregistered would be displayed literally in my system. Only
> subtags with corresponding Names in my database are processed before
> being displayed.
That is a distinction you have created, as seen above. And it causes
other problems, as seen below.
> The problem comes when I want to test that the code is doing what I
> described above. In order to ensure that I get a clear and permanent
> separation between a subtag that gets a display name and a subtag that
> gets output literally, I use private use subtags—they are permanently
> reserved, so I don't run the risk of them accidentally getting an
> associated Name in the future.
You could perform this test against all five types of registered subtag
(language, extlang, script, region, variant) by randomly generating a
subtag value and checking the Registry, or your database, to ensure that
the value isn't already registered, and iterating as necessary. (That's
how random, guaranteed-nonexistent filenames are generated.) There's no
need to have a predefined value that expressly means "nothing." That
isn't what private-use subtags are for anyway.
> And this system works fine for language, region, and script subtags.
> It falls apart when it comes to variant subtags. There is no clear
> separation between a subtag that should have a display name and which
> should get output literally—they all potentially fall into the former
But claiming that private-use subtags should not have a "display name"
while others should is your concept, not a BCP 47 concept.
> A subtag that "probably won't be registered" is not a
> concrete enough definition. Without formally and permanently reserving
> some variant subtags for private use, there will always be the
> possibility of any given subtag of being registered as a real variant.
> (The word of those currently involved that they'd never approve such a
> subtag is not good enough. Things change.)
Just check dynamically to make sure the test subtag is still not
> The only way to guarantee that a variant "will almost certainly never
> be registered" is to permanently and officially reserve it as such.
Nothing in the Registry, private-use subtags or anything else, is
reserved or registered to mean nothing.
I'm a developer who has written BCP 47 applications, and currently
working on a new one (though at present I have essentially zero time to
devote to it). In the past I pre-processed the Registry, reformatting
some things and adding my own assumptions, and it turned out that not
only did that add a noticeable burden for me every time the Registry was
updated, but it didn't even work because not everyone shares my
assumptions. It turned out to be better to accept the Registry verbatim,
and draw a clear and careful distinction between what it says and any
additional knowledge or assumptions I might add.
I believe that's the boat you're in now. Private-use does not mean "this
subtag has no meaning"; it means "there is assumed to be a private
agreement under which this subtag has a certain meaning." Thinking of
private-use as "no meaning" is an assumption you are adding.
Additionally, by treating private-use subtags the same as unregistered
ones, your assumption fails validity testing. A validating processor
should accept the tag "qaa-Qaaa-QZ" as valid, and should reject the tag
"eaa-Eaaa-EZ" as invalid because none of its subtags is registered. Your
process would treat both tags as invalid, which is contrary to BCP 47.
For the reasons I've stated, and speaking partly as a programmer and
partly as a BCP 47 Designated Expert, I think it would be a mistake to
register a subtag of any type for the purpose "this subtag has no
meaning" simply to solve a programming problem.
Doug Ewell | Thornton, Colorado, USA
http://www.ewellic.org | @DougEwell
More information about the Ietf-languages