character set for Nepali IDN

Basanta shrestha basanta.shrestha at gmail.com
Wed Feb 20 07:48:33 CET 2008


Dr. Vint Cerf, Dr. Sarmad Hussain, Mr. John and all

Sorry that I did not make myself clear. No such algorithm has been
used to generate that list so I guess its character-by-character
based. Let me know if things can't move forward without using the
property method.

Dear John, it is understandable that DNS won't be supporting the
distinctions among the various languages sharing a same script. Rest
assured that none of the DISALLOWED characters in the Draft is
required for Nepali. Just that there are many characters in Devanagari
slot that is never used for Nepali. As you said, I too agree that this
list will be more appropriate for registries while accepting IDNs in
Nepali.

Thank you Dr. Sarmad for clarifying me.

There are so many complex issues in IDN. I am constantly learning. I
will be very happy to answer any specific questions.

Regards,
Basanta Shrestha
MPP


On Feb 19, 2008 11:47 PM, John C Klensin <klensin at jck.com> wrote:
>

>
> --On Tuesday, 19 February, 2008 21:01 +0500 Sarmad Hussain
> <sarmad.hussain at nu.edu.pk> wrote:
>
> >
> > Dear John and all,
> >
> > Agreed that some of the restrictions (PVALID --> DISALLOWED)
> > can go to language tables, at the registry level.
> >
> > However, it is important that that none of the PVALID
> > characters in a language (as determined by its community) are
> > labelled as DISALLOWED by the IDNAbis revision process,
> > because it would not be possible to override DISALLOWED status
> > through the language tables.
>
> In principle, I certainly agree with this.  In practice, we need
> to be very careful that we do not escalate that principle to a
> firm rule, at least in that form.
>
> I hope there are no cases of the situation I'm concerned about
> in actual practice because the discussions would inevitably be
> very painful and hard to resolve.   But, if a character turns up
> that is an important element of the writing system for some
> language that would be seriously problematic, either for the
> Internet as a whole or for some other language that (mostly)
> shares the same script, then treating that character as an
> ordinary Protocol-Valid one is just not going to work, even if
> it means that some words of the language simply cannot be used
> as IDN labels.
>
> It is probably worth pointing out that this is nothing new:
> contrary to a popular assumption, the subset of ASCII that we
> call "LDH" is not sufficient to write all words of English.
>
> ZWJ and ZWNJ are partial examples of the problem because they
> are important in the right contexts but completely invisible
> (and hence potentially disastrous) when used in other contexts
> and with other scripts.   With IDNA2003, those two characters
> were simply banned because of that problem: a word that required
> either one simply could not be expressed as a valid label.   One
> of the big innovations in the IDNA200X proposals that has not,
> IMO, been discussed nearly enough is the idea of permitting some
> globally-problematic characters by restricting the contexts in
> which they can be used.  So ZWJ and ZWNJ are permitted, but
> _only_ for scripts in which they have a significant presentation
> effect and the need for them is unambiguous when transcribing a
> domain name from paper into a computer.
>
> This is not a complaint about anything anyone has done or not
> done, but the most useful data we could get right now (at least
> from my perspective) isn't "we need this character" but rather
> "that character is problematic for some languages, but we need
> it and it seems to us safe to use it if the following
> restrictions are applied...".   If the character is something
> that would normally be consider a letter or digit, the current
> rules are almost certain to pick it up and make it
> Protocol-Valid (but your verifying that against Patrik's rules
> and tables is very important).  But suppose the character is,
> for some reason, an edge case that wouldn't fall naturally and
> in terms of Unicode properties into "letter".  We need to be
> told about those, but it is even more important for us to know
> what restrictions might be necessary or appropriate to prevent
> problems.
>
> Put more broadly, if we are going to have an IDN system that
> works well globally, we must take care that global
> interoperation is our primary criterion for success rather than
> the ability to write the literature of any given language in the
> DNS.  The ability to be able to use any valid word as a label
> should be an important target, but, if we are to succeed, we
> must not let it become the primary goal.
>
> >  That is the case because
> > applications will not allow users to type in DISALLOWED
> > characters in the IDNs (as is the current practice).
>
> Applications will do what they do.  As others have heard me say
> too often on other lists, we need to avoid too much belief that
> the IETF's making a standard will, in and of itself, dictate
> behavior that everyone will follow.  Under the current practice,
> registries can register characters or sequences that IDNA200X
> prohibits as actual DNS entries.  As applications have evolved
> and their authors become convinced that they have the obligation
> to protect their users, such registrations may not be looked up
> and are likely to be displayed in ACE form... certainly not what
> anyone wants.  Perhaps worse, different applications interpret
> the rules and their obligations to users in different ways,
> leading to somewhat unpredictable behavior as far as the user is
> concerned.
>
> Viewed from that perspective, the IDNA200X proposal attempts to
> regularize the situation by giving applications clear guidance
> about the labels that they should or should not look up and the
> user or page designer the information that explicit use of
> U-labels is much less likely to cause problems than dependency
> on either local or Nameprep-like mapping.   But, under either
> the IDNA2003 protocol or the IDNA200X proposal, if a registry
> starts registering labels containing prohibited characters, it
> must do so with the understanding that such labels may not be
> looked up and handled in the way that the user might expect.
>
>      john
>
>
>


More information about the Idna-update mailing list