[OT] Re: support of metadata

Thu Sep 17 02:46:04 CEST 2009

--On Wednesday, September 16, 2009 17:10 +0900 "\"Martin J.
Dürst\"" <duerst at it.aoyama.ac.jp> wrote:

> 
> 
> On 2009/09/15 0:40, John C Klensin wrote:
> 
>> There may be an important message in this discussion, but it
>> doesn't have anything to do with domain names.  When we
>> actually label and identify bodies of text with language or
>> script information for reading and processing --in email
>> messages, for web pages, etc.-- our current model of
>> identifying a "charset" and language (via LTRU) may not be
>> sufficient.  While we do have a comparator registry, it isn't
>> tightly linked to the "charset/LTRU tag" labels.   Perhaps
>> language coding needs to be expanded to distinguish texts
>> that should be treated as if majuscule/ capital distinctions
>> are important from texts in which they are not.  But, again,
>> that has nothing to do with domain identifiers or with IDNA
>> in its role of making it possible to accommodate non-ASCII
>> identifiers in the DNS.

> I very much agree that there is some need here, but I strongly
> believe that markup is much better suited for this kind of job
> than per-document labeling, and that the IETF may not be the
> place with the most expertise for such an effort.

I think what is "best suited" depends tremendously on particular
applications and needs and that sometimes the answer will come
out per-document, sometimes embedded markup, and sometimes both.
I also agree about the IETF although more for some cases than
others.  

But the point on which I think we agree is that this is _not_ a
DNS (or IDNA) problem, if only because requiring that the user
know the specific language in order to look at something printed
and enter it into a computer so it can be looked up would be a
disaster.  

The reality is that Jean-Michel's idea of appropriating a few
DISALLOWED or contextually-prohibited Unicode characters and
using them as indicators is a hack that won't interoperate or
scale very well to substitute for something we have known how to
do properly since the original IDNA work started, have reviewed
several times, and always rejected because of bad interactions
with both users and the DNS (with that "guess the language"
issue heading the list).  That solution would be to use
   Prefix-LanguageIdentifer-CodedString
instead of 
   Prefix-CodedString

Other than a slightly shorter maximum effective label length,
the need to remember that having different LanguageIdentifiers
would still not permit different matching rules, and the need to
reach some sort of global consensus about how specific the
language identifiers would need to be (because the DNS can't be
taught about LTRU variable-precision matching rules), it
wouldn't be hard to define.  But, unless someone comes up with a
way to look at, e.g.,
   p33.a22.example.net
and know, definitively, whether the first two labels are
"English", "French", "German", etc., it just isn't workable in
practice.

best,
     john