IDNs and Language definitions and labeling (was: RE: New version, draft-faltstrom-idnabis-tables-02.txt, available)

Debbie Garside debbie at ictmarketing.co.uk
Fri Jun 22 12:12:17 CEST 2007


Hi John

Thank you for taking the time to formulate such an informative and useful
depth response - I cannot tell you how helpful this is!  I have read one or
two of the RFC's but it is good to have the "full set" identified.  I am
sure others monitoring this forum will think so too. 
 
Wrt RFC4646bis and ISO 639-6,  I have been an active participant of
IETF-languages and subsequently LTRU for the past 4 years.  See
http://www1.ietf.org/mail-archive/web/ltru/current/msg06477.html and follow
subsequent postings if you are interested in the latest discussion wrt
639-6.

However, I can see that my ideas for allocating Unicode code points to the
writing systems within ISO 639-6 does not meet the objectives of this
particular forum; sorry for muddying the waters but it is a bit of a passion
of mine which means it is quite hard to stop my fingers moving on the
keyboard sometimes :-)

I will take the time to read all the references in the hope that I can
actively participate in this forum in an informed manner - I hope you (and
others) will bear with me as I get up to speed with all this.

Thanks again.

Best wishes

Debbie

> One of the larger difficulties in many of the recent 
> discussions of IDNs -- much more so around ICANN than here -- 
> is that people try to make both policy and technical 
> decisions without a thorough understanding of the technology 
> itself.  I'd recommend, to you and others, a decent tutorial 
> on what the DNS is about in terms of design, operations, and 
> function [1].   One then needs to understand that IDNs are 
> simply a set of conventions and overlay on the DNS itself 
> and, at least in general, how that overlay works [2].  And, 
> to understand this effort, one should probably start with the 
> summaries of issues that have been found (or perceived) with 
> the 2003 version of IDNA [3].

 
> Part of that understanding (but not a quick summary or 
> substitute for the above) is that, while many of us are 
> intensely interested in identifier and referencing mechanisms 
> that are sensitive to language, orthography, and culture at a 
> level as fine-grained as the user or applications designer 
> thinks appropriate to his or her needs, the DNS is not a good 
> vehicle for that sort of work.
> 
> Because an application encountering a "DNS name" [4] has no 
> way to obtain information about the language the registrant 
> had in mind when registering the mnemonic string, the 
> applicability of any language-based information is quite 
> limited.  We can use information informed by knowledge of a 
> language to inform choices of scripts and characters to be 
> included, but that use does not require either language 
> tagging or a language taxonomy.
> Some registries can, and do, use language information to 
> restrict the characters that they permit to occur together in 
> a given label.  Using language (or script) information that 
> way has become a recommended practice, but it is optional, 
> different registries can and do handle it differently, and 
> the only use for language tagging in that context involves 
> communication between registrant and registrar and between 
> registrar and registry.  There has been no demonstrated need 
> for a single international standard in that area and, if 
> there were such a
> need, it would be out of the scope of this effort.   
> 
> However, all of those uses occur at registration time; at the 
> time of name resolution, or of presentation of information to 
> the user, there is no language information available at all 
> except by heuristic on the strings themselves.  Because those 
> strings are typically very short (or at least as short as 
> registrants who recognize user distaste for typing long 
> strings and the opportunities for bad behavior if there are 
> typing errors can make them), heuristics that work very well 
> with moderate-sized blocks of text will often not work well.  
> And, interestingly, one of the heuristics that many people 
> believe they can make into a firm and useful rule won't work 
> at all in the general DNS case (see discussion in reference [1]).
> 
> One final observation before I encourage you to stop reading 
> this and start reading the references: A suggestion to base 
> any of this work on ISO 639-6 runs into an extra problem that 
> you will need to address.  The IETF has adopted a system for 
> language tagging that is based on ISO 639-1, 639-2, and 15924 
> [5].  As you can probably appreciate, we smile at the old saw 
> that the nice thing about standards is that there are so many 
> of them, but generally try to avoid standardizing or relying 
> on redundant, duplicative, or alternate approaches to work 
> that is considered finished unless there are strong 
> justifications for doing so.  I suggest --with the 
> understanding that this is just my personal opinion-- that, 
> if you want to see 639-6 used in IETF-based protocols 
> (presumably including but not limited to IDNA), your first 
> step is to write up a set of discussion notes, in 
> Internet-Draft form [6], that reviews the differences between 
> an approach based on 639-6 and one based on a profile of RFC
> 4646 or its successor and that discusses the circumstances in 
> which one would be more usefully applicable than the other.
> 
> best wishes and happy reading,
>     john
> 
> 
> -----------
> 
> [1] A well-vetted and reasonably balanced tutorial, oriented 
> toward policy makers rather than deep understanding of the 
> technology, is a US National Research Council Report, 
> _Signposts in Cyberspace: The Domain Name System and Internet 
> Navigation_, 
> http://books.nap.edu/catalog.php?record_id=11258.  For a 
> deeper understanding, the core DNS specifications themselves are RFC
> 1034 and 1035.  (RFCs can be obtained from a number of 
> locations.  The official location permits retrieving them by 
> substituting the RFC number for NNNN in
> ftp://ftp.rfc-editor.org/in-notes/rfcNNNN.txt)
> 
> [2] RFC 3490, 3491, 3492, and 3454.  RFCs can be obtained 
> from a number of locations.  The official location permits 
> retrieving them by substituting the RFC number for NNNN in 
> ftp://ftp.rfc-editor.org/in-notes/rfcNNNN.txt
> There are also several tutorials floating around, but they 
> tend to be addressed to a user-level understanding rather 
> than the understanding needed to discuss the protocol issues
> intelligently.   Slideware for one of them (now somewhat dated)
> is at http://ws.edu.isoc.org/workshops/2004/ICANN-KL/
> 
> [3] RFC 4690 and
> http://www.ietf.org/internet-drafts/draft-klensin-idnabis-issu
es-01.txt.
> These two documents are complementary; neither can be 
> adequately understood without the other.  The second one is 
> likely to be replaced in the next week or so with an updated 
> version, which will have the same URL but with "-02" 
> substituted for "-01". 
> 
> [4] As you might have noticed in my exchange with Gervase, 
> I've concluded that the use of terms like "name" or "word" 
> are just introducing more confusion.  Many, perhaps most, DNS 
> "names" are not "words" in the sense of obeying the 
> orthographic or phonetic rules of any language; perhaps we 
> can reduce the confusion we are causing ourselves by shifting 
> to "mnemonic", which more closely describes the actual situation.
> 
> [5] RFC 4646 and
> http://www.ietf.org/internet-drafts/draft-ietf-ltru-4646bis-06.txt.
> For many purposes, these documents are incomplete without 
> "matching rules", discussed in RFC 4647.
> 
> [6] See the discussion at http://www.ietf.org/ID and the 
> links to information about format and tools leading from that page.
> 
> _______________________________________________
> Idna-update mailing list
> Idna-update at alvestrand.no
> http://www.alvestrand.no/mailman/listinfo/idna-update
> 
> 





More information about the Idna-update mailing list