A proposed solution for descriptions

Addison Phillips addison at yahoo-inc.com
Sun Jun 18 23:05:45 CEST 2006


I agree with most of Mark Crispin and Debbie Garside's notes.

I have some minor observations:

1. The form <U+1234> is equally unnecessarily obscure. There exist perfectly
good escape formats (\u1234 and \U123456) that would probably serve better
(since several programming languages will interpret these formats). The
choice of an NCR format stolen from XML was (not actually, but may as well
have been) arbitrary and is no better or worse than Mark's suggestion (well,
maybe slightly worse). However, any array of "gunk" in the file requires
additional processing or observation on the part of the user.

2. The IETF does allow UTF-8 registries. Apparently one already exists (I
don't remember which). The problem here was the decision/need to publish the
initial registry as an I-D. I would not support an ASCII only registry
otherwise. I don't believe Mark Davis would either: we are both committed
supporters of Unicode. And personally I find an ASCII only registry to be
stupid.

3. The ONLY way to change the format of the registry is to update RFC
3066bis. There will be an opportunity to change the format when we update
that document to support ISO 639-3. When that happens, I hope that we will
convert the registry to UTF-8 and that this foolishness will be consigned to
the dustbin of history.

4. Well... I do note that the sequence <U+201B> is a perfectly good
"latin-script" string. This list *could* register such strings and ignore
the guidance in RFC 3066bis, but I think that this would be extremely
confusing.

> However, I suspect that there will be a long-term requirement for an 
> ASCII-only representation.

5. The question being: ASCII-only representation of *what*? As long as one
description field is ASCII in each record, I don't see there being a valid
objection to having non-ASCII description fields where they are warranted.

Finally: I would stipulate that the purpose of the Description field is to
identify to human users of the registry (i.e. implementers) what the subtag
values "mean". This is not at all the same thing as asserting the actual
description or name of the subtag in any particular language. Such
applications are important and users should refer to external references,
such as the ISO and UN standards themselves or to projects such as CLDR to
obtain display names in any particular language. I agree 100% with Debbie
that the registry should pick up *exactly* what ISO 639, 3166, 15924, or UN
M.49 emits. I think we should add ASCII-only descriptions via separate
registration or via a consensus amendment to the original registration.

> > I approve of this format:
> >
> > Type: language
> > Subtag: nqo
> > Description: N'Ko
> > Description: N&#x2019;Ko
> > Suppress-Script: Nkoo
> > Added: 2006-xx-xx

+1

Addison

Addison Phillips
Internationalization Architect - Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.  

> -----Original Message-----
> From: ietf-languages-bounces at alvestrand.no 
> [mailto:ietf-languages-bounces at alvestrand.no] On Behalf Of 
> Mark Crispin
> Sent: 2006?6?18? 12:27
> To: Debbie Garside
> Cc: ietf-languages at iana.org; 'Doug Ewell'
> Subject: RE: A proposed solution for descriptions
> 



More information about the Ietf-languages mailing list