A proposed solution for descriptions

L.Gillam at surrey.ac.uk L.Gillam at surrey.ac.uk
Mon Jun 19 14:38:54 CEST 2006


The current discussion appears to conflate a number of issues.

Some of these appear to be partially separable, and perhaps discussion can be separated as such, as:

1. Compatibility with underlying standards - taking (all?) names as-is to ensure consistency with the "original" sources
2. Enforcing consistency in entries in the registry - where ISO naming conventions themselves do not appear to be consistent, providing "correction" to problems discovered in 1. 
3. "Translation" - providing names for elements from the underlying standards that do not appear in the source standards (extension would be to provide all *easily available* names in all languages - whatever that may mean) 
4. Searchability - enabling all sorts of searches, including "character" variation in names (ASCIIfied) to lead to the "correct" item. How far to go? Regular expression syntax? Would be useful if the Googles made direct reference but not so much if everybody is already *fixing* the problem somehow.

As such, if "Falkland Islands (Malvinas)" is covered by 1, then adding items to the registry that separate "Falkland Islands" and "Malvinas" would seem to be a way of catering for 2 or maybe 4; "Ivory Coast" provides for 3 and/or 4; Gwich´in vs Gwich'in for 2 and/or 4. Each depends on the rules being followed to determine what problem is being solved in each case.

While a difficult decision, a line-in-the-sand needs to be drawn whereby some item is chosen as a solid reference. Not everybody is going to be happy about what that item is: for example, why is "Malvinas" acceptable as part of the English name, given that it is not the English name, so removing "(Malvinas)" could be seen to fix 2; "Ivory Coast" is somehow not acceptable, even though it is what English speakers refer to, and have referred to, so "Ivory Coast" could be considered under 2 and/or 3. It's a question of where the line is drawn - oiling the wheels that squeak most is not particularly systematic, you need to oil all squeaky wheels.

The above 4, though some could be lumped or split further, are questions of what functionality a (any) registry should support. Ideally, all of the above. But realistically? There are good arguments both for and against each and all of 1-4. How much energy does everybody have, and how exhausted do people want to be?

John Cowan wrote:

> I have not read 11179, but is there some reason why the code element
> itself cannot be the Unique Identifier?

But which code element (for uniqueness)? There's no specific reason why not, providing you can control uniqueness over time ...... it would seem that only 2 above provides rules (and people?) over which control can be directly exercised (RFC) and consistency maintained. Ideally, all code elements for the entry are equivalently associated to that entry anyway. Anomalies like "CS" aside. 

While not fully opening the 11179 can of worms, it might be interesting to consider how information is documented from http://dublincore.org/documents/1999/07/02/dces/ for Dublin Core: "Each Dublin Core element is defined using a set of ten attributes from the ISO/IEC 11179 [ISO11179] standard for the description of data elements.". Separation of naming and identification may help understand 1-4 and what we cannot live without versus what might be useful, and all points in between. It appears consensus is needed on what general functionality people are prepared to accept or reject - provided that the functionality can be supported for all possible cases. 


More information about the Ietf-languages mailing list