A proposed solution for descriptions

Mark Crispin mrc at CAC.Washington.EDU
Sun Jun 18 21:27:07 CEST 2006


On Sun, 18 Jun 2006, Debbie Garside wrote:
> I object to this format:
>
> -----
>
> Type: language
> Subtag: nqo
> Description: N’Ko
> Suppress-Script: Nkoo
> Added: 2006-xx-xx
>
> -----
>
> I approve of this format:
>
> Type: language
> Subtag: nqo
> Description: N'Ko
> Description: N’Ko
> Suppress-Script: Nkoo
> Added: 2006-xx-xx
>
> -----

I agree completely with this.

>> Mark Crispin said that the entire premise of using hex NCRs
>> in the Registry was wrongly conceived, and if IANA (or IETF)
>> limits us to ASCII then we should have stayed with ASCII, and
>> not attempted to represent non-ASCII characters using the
>> hex-NCR kludge or any other kludge.
>> Several, most notably Michael Everson, disagree and feel it
>> is important to include non-ASCII, even in the case of
>> apostrophes where little or no confusion will result (John
>> Cowan disagreed about the apostrophes).
> I agree with Michael. But I think it is important to have both.  Hex 
> NCRs may be ill-conceived but I think it is necessary information given 
> the imitations of ASCII - I don't know enough about alternative formats 
> for displaying this information to comment further.

My position is somewhat mis-represented here.

Yes, I object to
 	Description: N’Ko

However, that objection is ameliorated by:
 	Description: N'Ko
 	Description: N’Ko
and completely satisfied
 	Description: N'Ko
 	Description: N<U+2019>Ko

I prefer the latter, since <U+2019> is a metaword convention, already in 
use, that states "Unicode codepoint 2019 goes here".  &#x2019; is SGML 
blather that should never be inflicted upon innocent human eyes.

This isn't an SGML document, advertised as necessary to run through an 
SGML rendering enginer.  It's a plaintext document, and ASCII as well.

There was a HUGE fight in the IETF when the first attempt was made to 
produce an RFC that only existed in Postscript form.  The fight escalated 
when an ASCII text form was offered that did not render tables and 
diagrams into usable ASCII art.

This particular registry is unlikely to produce such a harsh reaction, but 
there's no need to go down the path at all.  "<U+2019>" consumes exactly 
the same number of bytes as "&#x2019;".  No one's floppy disk will fill up 
because of the use of a "wasteful" human-friendly representation.

In the medium term, I expect that the IETF will allow plaintext documents 
in UTF-8.  Thus, we can use fixed-width font Unicode art instead of ASCII 
art, and it'll be possible to use the U+2019 codepoint instead of an ASCII 
representation.

However, I suspect that there will be a long-term requirement for an 
ASCII-only representation.

I doubt that we will see the IETF accepting non-plaintext documents as a 
sole form in the foreseeable future.  The current perferred rendering 
language tends to follow the fads of the time; and documents have to 
outlive the fad (and perhaps also the platform).  At least, it will not 
happen until the dying out of the generation that remembers the extinct 
rendering languages of the past (and has bitter memories of having to 
recreate documents in those languages into modern forms).

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.


More information about the Ietf-languages mailing list