A proposed solution for descriptions

Sun Jun 18 23:12:31 CEST 2006

Doug wrote:

> > What I actually said or indeed meant is all names as represented 
> > within the underlying standards should be included in the 
> registry in 
> > the EXACT format that they take within the standard.  This 
> means that 
> > if a name is presented as Foo (Bar) in the underlying 
> standard then it 
> > remains as Foo (Bar).  Three reasons for this, one: 
> consistency, two:
> > very often the bracketed information acts as an additional 
> qualifier,
> > three: the name can be used by other systems as a Unique Identifier 
> > (ISO 11179).  From what I can see additional names (within ISO
> > 639-1/2) are delimited by ";" and these should be added as further 
> > descriptions.
> 
> Taking the second reason first: If the bracketed information 
> acts as an additional qualifier, then I agree 100% that it 
> should be included as part of the description.  This is true 
> in the case of "Slave (Athapascan)" and I made a mistake by 
> splitting those out (and have withdrawn that suggestion).  It 
> does not seem true in the case of "Deseret (Mormon)" or 
> "Falkland Islands (Malvinas)".

I know, this is inconsistent behaviour on ISO's part.  I think we must
decide on the rules in general for dealing with these things wrt ISO 639
parts 1 and 2 and where there are inconsistencies in ISO naming conventions
deal with them individually.  I still think that one description within the
registry should be rendered EXACTLY as it is within the official published
standard.

> Debbie is right that ISO 639 is consistent about using 
> parentheses to indicate qualifiers, and semicolons to 
> indicate alternative names.  But they are not always 
> consistent about the order of the names, so it cannot always 
> be detemined which is the "original" name and which are 
> "additional." 

Hmmm... I am confused here!  There is only one published version of the
standard so how can it not be consistent.  Take the first name and the rest
become "other" descriptions.  I still think a representation as per the
standard is required.  No room for misunderstanding then.

> ISO 3166 uses parentheses for alternative 
> names (there being no "qualifiers" per se)

In which case a rule can be written so long as ISO remains consistent - we
should be able to manage this now.

> and ISO 15924 uses 
> parentheses for both, so some human judgement must still be 
> applied there.

I am sure the ISO 15924 MA is listening so perhaps new rules will be
introduced in the future.  In the meantime, a rule can be made, it just
means interpreting the bracketed items (alternative names/qualifiers).

> Consistency with the ISO standards was my reason for sticking 
> with the exact apostrophe style used in those standards.  
> Thus we ended up with "Gwich´in" from ISO 639 (acute accent 
> used as apostrophe), "N’Ko" from ISO 15924 (curly 
> apostrophe), and "Côte d'Ivoire" from ISO 3166 (straight 
> apostrophe but non-ASCII "o with circumflex").

I'm not getting into the whole apostrophe thing... Tis almost beyond me.
However, I thin if there is a "correct" Unicode code point that should be
used this can be reflected in the secondary description.

> I still need to spend some additional time studying ISO 
> 11179.  My knee-jerk reaction, with regard to using one of 
> the Description fields as a Unique Identifier, 

It is not about using one of the "descriptions" it is about using THE ISO
description.  There is only one description whether is has several names
within the field or not (I think).  

> is that I 
> would hate to be in the situation that Unicode and ISO 10646 
> have found themselves with character names.  They are 
> normative and guaranteed to be stable and immutable, and 
> because of that there are several wrong or misleading names 
> in the standard, which causes much misunderstanding and flamage.

Surely everything is stable provided you record when an entity becomes
retired or deprecated.  I don't see a problem. But then I am not a Unicode
expert.

> It would probably help if the Description field that is 
> intended to be the Unique Identifier could be distinguished 
> from alternative descriptions that are included as ASCII 
> fallbacks, typographical improvements, historic names, or 
> commonly accepted aliases (like "North Korea").  This is not 
> provided for in the approved draft (all Description fields 
> are equal regardless of position) and would have to wait 
> until the document is revised.

Agreed and I think we should discuss ISO 11179 a wee bit when it comes to
the LTRU Charter.  I'm not ready to take that discussion yet - so please
(everyone) let's not muddy the waters by having it now.

> > Where a known name includes a diacritic mark or other 
> character that 
> > cannot be represented in ASCII, there should be an ADDITIONAL 
> > description field giving the code point in whatever format 
> is agreed.
> > However, there must always be an ASCII equivalent for human 
> > readability.
> 
> I agree with this, except that we cannot currently 
> distinguish "additional" descriptions from the "main" 
> description, as mentioned above.

See response on this above.
> 
> > Please remember that we are not all working for multi-nationals, we 
> > are not all programmers/software developers and the whole 
> purpose of 
> > standardisation is to make it accessible to all in order 
> that it may 
> > stand a chance of being adopted by all; thus creating a standard.
> 
> As I stated last week, Mark Crispin's and Richard Ishida's 
> observations about text searching were what caused me to 
> change my mind and support ASCII fallback descriptions.

Good... Flip-floppers have many redeeming features - especially when they
agree with me :-)

> I am becoming quite worried about the floodgates.  We are 
> taking the Description field(s) to be much more prescriptive 
> than Section 3.1 indicates.

Whatever we agree upon now needs to be recorded for inclusion in RFC3066ter.
I think it is quite OK to do this as it is not documented that it cannot be
done.  I do think that a tightening of rules is needed wrt this sort of
thing in the next version.

> >> It's important to keep in mind that when we start talking 
> about ISO 
> >> 639-3, there are some pairs of language names that differ only in 
> >> diacritical marks.  For example, Arua and Aruá are two different 
> >> languages.  In a case like this, we will not want to 
> provide an ASCII 
> >> fallback of any sort for Aruá, because that would give us two 
> >> languages with the same name.
> >
> > WRONG.  There will be one description for the first 
> instance and two 
> > for the second. This is perfectly understood as a human or 
> when being 
> > parsed so long as a written methodology is included within the 
> > standard.
> 
> So we would have the following?
> 
> Type: language
> Subtag: aru
> Description: Arua
> Added: 200x-xx-xx
> ...
> Type: language
> Subtag: arx
> Description: Aru&#xE1;
> Description: Arua
> Added: 200x-xx-xx
> 
> That worries me.

Yes, and I can see why.  I have to admit I have been thinking about this and
maybe thinking that annotating (both records - with a see also) may get
around it.  

> >> Nobody seemed to have any objection to the other splits (e.g. 
> >> Han/Hanzi/Kanji/Hanja).
> >
> > I object if the name as represented in the underlying ISO 
> standard is 
> > not retained.  I have no real objection to additional 
> descriptions but 
> > I think if you are going to do this there needs to be a 
> written rule 
> > as to when additional descriptions can/should be added - 
> flood gates 
> > and all that!
> 
> The name as stated in ISO 15924 is "Han (Hanzi, Kanji, 
> Hanja)".  I would not have suggested any additional names 
> such as "Chinese writing"  that didn't appear in the standard.

I think the main name should still be "Han (Hanzi, Kanji, 
> Hanja)".  You can still have individual descriptions which will tell
people that the information contained within the parentheses is not a
qualifier.

> > That's my response for what its worth :-)
> 
> It is certainly worth a bundle -- much more than an opinion 
> that goes unspoken.

Thank you

Debbie Garside
> 
> --
> Doug Ewell
> Fullerton, California, USA
> http://users.adelphia.net/~dewell/
> 
> 
>