A proposed solution for descriptions
Doug Ewell
dewell at adelphia.net
Sun Jun 18 22:37:13 CEST 2006
Debbie Garside <debbie at ictmarketing dot co dot uk> wrote:
>> Debbie Garside suggested the inclusion of "all 'known
>> names'/alternative names" as descriptions, but only in the sense that
>> draft standards such as ISO 639-3 and 639-6 define a "known name"
>> which serves as an ISO 11179 Unique Identifier, plus zero or more
>> "alternative names." This is not a call for free registration of any
>> imaginable name or nickname, such as "Down Under" for Australia.
>
> What I actually said or indeed meant is all names as represented
> within the underlying standards should be included in the registry in
> the EXACT format that they take within the standard. This means that
> if a name is presented as Foo (Bar) in the underlying standard then it
> remains as Foo (Bar). Three reasons for this, one: consistency, two:
> very often the bracketed information acts as an additional qualifier,
> three: the name can be used by other systems as a Unique Identifier
> (ISO 11179). From what I can see additional names (within ISO
> 639-1/2) are delimited by ";" and these should be added as further
> descriptions.
Taking the second reason first: If the bracketed information acts as an
additional qualifier, then I agree 100% that it should be included as
part of the description. This is true in the case of "Slave
(Athapascan)" and I made a mistake by splitting those out (and have
withdrawn that suggestion). It does not seem true in the case of
"Deseret (Mormon)" or "Falkland Islands (Malvinas)".
Debbie is right that ISO 639 is consistent about using parentheses to
indicate qualifiers, and semicolons to indicate alternative names. But
they are not always consistent about the order of the names, so it
cannot always be detemined which is the "original" name and which are
"additional." ISO 3166 uses parentheses for alternative names (there
being no "qualifiers" per se) and ISO 15924 uses parentheses for both,
so some human judgement must still be applied there.
Consistency with the ISO standards was my reason for sticking with the
exact apostrophe style used in those standards. Thus we ended up with
"Gwich´in" from ISO 639 (acute accent used as apostrophe), "N’Ko" from
ISO 15924 (curly apostrophe), and "Côte d'Ivoire" from ISO 3166
(straight apostrophe but non-ASCII "o with circumflex").
I still need to spend some additional time studying ISO 11179. My
knee-jerk reaction, with regard to using one of the Description fields
as a Unique Identifier, is that I would hate to be in the situation that
Unicode and ISO 10646 have found themselves with character names. They
are normative and guaranteed to be stable and immutable, and because of
that there are several wrong or misleading names in the standard, which
causes much misunderstanding and flamage.
It would probably help if the Description field that is intended to be
the Unique Identifier could be distinguished from alternative
descriptions that are included as ASCII fallbacks, typographical
improvements, historic names, or commonly accepted aliases (like "North
Korea"). This is not provided for in the approved draft (all
Description fields are equal regardless of position) and would have to
wait until the document is revised.
> Where a known name includes a diacritic mark or other character that
> cannot be represented in ASCII, there should be an ADDITIONAL
> description field giving the code point in whatever format is agreed.
> However, there must always be an ASCII equivalent for human
> readability.
I agree with this, except that we cannot currently distinguish
"additional" descriptions from the "main" description, as mentioned
above.
> Please remember that we are not all working for multi-nationals, we
> are not all programmers/software developers and the whole purpose of
> standardisation is to make it accessible to all in order that it may
> stand a chance of being adopted by all; thus creating a standard.
As I stated last week, Mark Crispin's and Richard Ishida's observations
about text searching were what caused me to change my mind and support
ASCII fallback descriptions.
> I object to this format:
> ...
> Description: N’Ko
> -----
>
> I approve of this format:
> ...
> Description: N'Ko
> Description: N’Ko
+1
>> Kent Karlsson expressed a preference for "Book Norwegian" instead of
>> the ASCII-folded "Norwegian Bokmal", and "New Norwegian" as an
>> additional alias for "Norwegian Nynorsk". Several people disagreed
>> that the ASCII version of the Description should be an English
>> translation, especially in a case like this where no English speaker
>> would use the translated name to refer to the entity.
>
> I think we need to get back to the ISO standards as mentioned
> previously.
+1
>> Kent also suggested "Ivory Coast" as the ASCII fallback for "Côte d’Ivoire".
>> While this is, once again, not an ASCII fallback but an English
>> translation, Peter Constable pointed out that "Ivory Coast" has both
>> currency and historical usage. (For example, The Times of London and
>> the New York Times both use "Ivory Coast", which I did not know.)
>> This might make a reasonable alternative description, and it can be
>> proposed as such at any time (even now), but we should try to avoid
>> confusing this with the issue of ASCII fallbacks.
>
> This would open the "flood gates" in my opinion. I am sure there is
> both "currency and historical usage" for translation of most of the
> names in many a number of languages. Bad move to add it just because
> it is an English translation.
I am becoming quite worried about the floodgates. We are taking the
Description field(s) to be much more prescriptive than Section 3.1
indicates.
> In order to introduce a consistent methodology for dealing with these
> issues, I would suggest just dropping the diacritic mark. This may
> not work for some non-ASCII characters but certainly where diacritics
> are involved it is the best solution for a standard approach IMHO.
> Otherwise we are looking at reviewing all instances as opposed to
> applying a simple set "diacritic rule".
> ...
> Set a rule for diacritics and follow it... Then there is no need to
> proffer alternatives and spend days discussing them.
+1
>> It's important to keep in mind that when we start talking about ISO
>> 639-3, there are some pairs of language names that differ only in
>> diacritical marks. For example, Arua and Aruá are two different
>> languages. In a case like this, we will not want to provide an ASCII
>> fallback of any sort for Aruá, because that would give us two
>> languages with the same name.
>
> WRONG. There will be one description for the first instance and two
> for the second. This is perfectly understood as a human or when being
> parsed so long as a written methodology is included within the
> standard.
So we would have the following?
Type: language
Subtag: aru
Description: Arua
Added: 200x-xx-xx
...
Type: language
Subtag: arx
Description: Aruá
Description: Arua
Added: 200x-xx-xx
That worries me.
> Remember what the alpha2/3 code is for. I would suggest coming to
> some sort of tentative agreement on the records proposed by Doug and
> then tackling this with written rules in RFC3066ter.
Absolutely agree. We don't have to worry about it now, but we will
definitely have to worry about it before adding the 639-3-based subtags.
> I think I am right in saying that within 639-3 (and certainly within
> 639-6) information contained within parentheses is ALWAYS used as
> qualifiers NOT alternative names. www.sil.org/iso639-3/codes.asp I am
> sure Peter will correct me if I am wrong.
I think you are right too.
>> Michael Everson asked not to split "Falkland Islands (Malvinas)" and
>> "Holy See (Vatican City State)" since they have not been shown to
>> cause confusion as is. While I would prefer to treat this
>> multiple-name situation consistently, regardless of the type of
>> subtag, I don't plan to fight hard over it; other issues are more
>> important to me.
>
> As these names are presented within the underlying ISO standards as
> above, I am with Michael on this. However, I would not strenuously
> object to adding two additional names, splitting the descriptions,
> provided the original stays intact. Thus a record such as Holy See
> (Vatican City State) would have 3 descriptions.
Let's put it this way: Is there anyone who strongly *supports* adding
the names individually? We could just let this one go. I'm not
attached to it.
>> Nobody seemed to have any objection to the other splits (e.g.
>> Han/Hanzi/Kanji/Hanja).
>
> I object if the name as represented in the underlying ISO standard is
> not retained. I have no real objection to additional descriptions but
> I think if you are going to do this there needs to be a written rule
> as to when additional descriptions can/should be added - flood gates
> and all that!
The name as stated in ISO 15924 is "Han (Hanzi, Kanji, Hanja)". I would
not have suggested any additional names such as "Chinese writing" that
didn't appear in the standard.
> That's my response for what its worth :-)
It is certainly worth a bundle -- much more than an opinion that goes
unspoken.
--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
More information about the Ietf-languages
mailing list