<html>

<body>

Mark,<br>

thank you for your comments. I will not comment much your inputs as they

are mostly related to Unicode. I will only focus on the Internet

engineering oriented issues, and the difference it may

represent.<br><br>

At 04:48 14/12/2007, Mark Davis wrote:<br>

<blockquote type=cite class=cite cite="">Issues-5.<br><br>

<pre>&nbsp;&nbsp; IDNA uses the Unicode character repertoire, which

avoids the

&nbsp;&nbsp; significant delays that would be inherent in waiting for a

different

&nbsp;&nbsp; and specific character set be defined for IDN purposes,

presumably by

&nbsp;&nbsp; some other standards developing organization.

</pre><font face="Courier New, Courier"></font>Seems odd. There are no

other contenders in the wings. Would be better, if this has to be said,

to just cite other IETF documents describing the reasons for using

Unicode. </blockquote><br>

There is a need for a universal visual sign code to support semiotics and

security needs. Should have such a work started in 2002 when that

decision was taken, we would benefit from it now. NB. I do not think that

there is an architectural conflict between the two options. When ready -

hopefully not within 20 years - it will only provide an operational

choice.<br><br>

<blockquote type=cite class=cite cite="">Issues-6.<br><br>

<pre>&nbsp;&nbsp; To improve clarity, this document introduces three new

terms.&nbsp; A

&nbsp;&nbsp; string is &quot;IDNA-valid&quot; if it meets all of the

requirements of this

&nbsp;&nbsp; specification for an IDNA label.&nbsp; It may be either an

&quot;A-label&quot; or a

&nbsp;&nbsp; &quot;U-label&quot;, and it is expected that specific

reference will be made to

&nbsp;&nbsp; the form appropriate to any context in which the distinction

is

&nbsp;&nbsp; important.

...

&nbsp;&nbsp; A &quot;U-label&quot; is an IDNA-valid string of

&nbsp;&nbsp; Unicode-coded characters that is a valid output of

performing

&nbsp;&nbsp; ToUnicode on an A-label, again regardless of how the label

is

&nbsp;&nbsp; actually produced.

</pre><font face="Courier New, Courier"></font>These definitions appear

circular, so they need to be teased out a bit. </blockquote><br>

It would be interesting if you could help it. I also opposed that kind of

definition at the WG-IDNA, but I was convinced that there was no other

way to clearly explain &quot;internationalization&quot; as something

producing the opposited to what most non-English speakers would call

&quot;international&quot;. <br><br>

<blockquote type=cite class=cite cite="">Issues-7.<br><br>

<pre>&nbsp;&nbsp; Depending on the system involved, the major difficulty

may not lie in

&nbsp;&nbsp; the mapping but in accurately identifying the incoming

character set

&nbsp;&nbsp; and then applying the correct conversion routine.&nbsp; It

may be

&nbsp;&nbsp; especially difficult when the character coding system in

local use is

&nbsp;&nbsp; based on conceptually different assumptions than those used

by

&nbsp;&nbsp; Unicode about, e.g., how different presentation or combining

forms

&nbsp;&nbsp; are handled.&nbsp; Those differences may not easily yield

unambiguous

&nbsp;&nbsp; conversions or interpretations even if each coding system

is

&nbsp;&nbsp; internally consistent and adequate to represent the local

language

&nbsp;&nbsp; and script.

</pre><font face="Courier New, Courier"></font>I suggest the following

rewrite:<br><br>

The main difficulty typically is that of&nbsp; accurately identifying the

incoming character set so as to apply the correct conversion routine.

Theoretically, conversion could be difficult if the non-Unicode character

encoding system were based on conceptually different assumptions than

those used by Unicode about, e.g., how different presentation or

combining forms are handled. Some examples are the so-called

&quot;font-encodings&quot; used on some Indian websites. However, in

modern software, such character sets are rarely used except for

specialized display. </blockquote><br>

I have no objection. However, the proposed wording seems to permit to

support general interoperability with non-Unicode systems. This could

permit to support different coding universal systems. Starting with

various Unicode versions. <br><br>

For example (we are in real time operations), nothing prevents to try to

a system, and another one if the first attempts fails.<br><br>

<blockquote type=cite class=cite cite="">Issues-8.<br><br>

<pre>&nbsp;&nbsp; That, in turn, indicates that the script community

&nbsp;&nbsp; relevant to that character, reflecting appropriate

authorities for

&nbsp;&nbsp; all of the known languages that use that script, has agreed

that the

&nbsp;&nbsp; script and its components are sufficiently well

understood.&nbsp; This

&nbsp;&nbsp; subsection discusses characters, rather than scripts,

because it is

&nbsp;&nbsp; explicitly understood that a script community may decide to

include

&nbsp;&nbsp; some characters of the script and not others.

&nbsp;&nbsp; Because of this condition, which requires evaluation by

individual

&nbsp;&nbsp; script communities of the characters suitable for use in

IDNs (not

&nbsp;&nbsp; just, e.g., the general stability of the scripts in which

those

&nbsp;&nbsp; characters are embedded) it is not feasible to define the

boundary

&nbsp;&nbsp; point between this category and the next one by general

properties of

&nbsp;&nbsp; the characters, such as the Unicode property

lists.</pre><font face="Courier New, Courier"></font>There is no

justification given for this process. Moreover, it will be doomed to

failure. Merely the identification of &quot;script communities&quot; is

an impossible task. Who speaks for the Arabic script world? Saudi Arabia

(Arabic)? Iran (Persian,...)? Pakistan (Urdu,...)?, China

(Uighur,...)?</blockquote><br>

You think as the Unicode script community. The other script communities

discussed here are the Internet communities of the concerned TLD (cf. RFC

1591). When Poland introduced an Arabic table they were strongly

criticized by some significant members of the Arabic community. Poland

apologised (however, they were perfectly right IMHO as they just

documented the Polish Arabic script, a table you do not know in Unicode,

but which exists in IANA).<br><br>

A solution, I favor, is to consider the whole IDNA business as a Unicode

proposition. In such a case, there would be no conflicts. I favor&nbsp;

multilingualisation as I think it is more complex for the network (and

therefore more technically rewarding because it calls for network

architecture openings) to support it. I have no objection to Unicode

supporting internationalization as an application on the Internet. What I

strongly oppose is to confuse both (to put internationalization at

architectural level because it leads to architectural constraints).

<br><br>

<blockquote type=cite class=cite cite="">Issues-14.<br><br>

<pre>&nbsp;&nbsp; Applications MAY

&nbsp;&nbsp; allow the display and user input of A-labels, but are not

encouraged

&nbsp;&nbsp; to do so except as an interface for special purposes,

possibly for

&nbsp;&nbsp; debugging, or to cope with display limitations.&nbsp;

A-labels are opaque

&nbsp;&nbsp; and ugly, and, where possible, should thus only be exposed

to users

&nbsp;&nbsp; who absolutely need them.&nbsp; Because IDN labels can be

rendered either

&nbsp;&nbsp; as the A-labels or U-labels, the application may reasonably

have an

&nbsp;&nbsp; option for the user to select the preferred method of

display; if it

&nbsp;&nbsp; does, rendering the U-label should normally be the default.

</pre><font face="Courier New, Courier"></font>Add: <br><br>

It is, however, now common practice to display a suspect U-Label (such as

a mixture of Latin and Cyrillic) as an A-Label.</blockquote><br>

Many registration and domain name management, and babel names will be

carried as a-label. <br><br>

<br><br>

<blockquote type=cite class=cite cite="">Issues-15.<br>

<pre>&nbsp;&nbsp; 6.3.&nbsp; The Ligature and Digraph Problem

&nbsp;&nbsp; There are a number of languages written with alphabetic

scripts in

&nbsp;&nbsp; which single phonemes are written using two characters,

termed a

&nbsp;&nbsp; &quot;digraph&quot;, for example, the &quot;ph&quot; in

&quot;pharmacy&quot; and &quot;telephone&quot;.

</pre><font face="Courier New, Courier"></font>The text has been improved

considerably from earlier versions, but the whole issue is just a special

case of the fact that words are spelled different ways in different

languages or language variants. And it has really nothing to do with

ligatures and diagraphs. The same issue is exhibited between

<a href="http://theatre.com">theatre.com</a> and

<a href="http://theater.com">theater.com</a> as between a Norwegian URL

with ae and a Swedish one with a-umlaut.<br><br>

So if you retain this section, it should be recast as something like

<br><br>

<br>

6.3 Linguistic Expectations<br><br>

Users often have certain expectations based on their language. A

Norwegian user might expect a label with the ae-ligature to be treated as

the same label using the Swedish spelling with a-umlaut. A user in German

might expect a label with a u-umlaut and the same label with

&quot;ae&quot; to resolve the same. For that matter, an English user

might expect &quot;<a href="http://theater.com">theater.com</a>&quot; and

&quot;<a href="http://theatre.com">theatre.com</a>&quot; to resolve the

same. [more in that vein].</blockquote><br>

This issue is IMHO out of the scope of an engineering document. As if in

current DNS RFC it was indicated that &quot;hundred&quot; can also be

noted &quot;100&quot;.<br><br>

<br>

<blockquote type=cite class=cite cite="">Issues-16.<br><br>

there is no evidence that<br>

they are important enough to Internet operations or<br>

internationalization to justify large numbers of special cases<br>

and character-specific handling (additional discussion and<br><br>

I suggest the following wording instead:<br><br>

there is no evidence that<br>

they are important enough to Internet operations or<br>

internationalization to justify inclusion (additional discussion

and<br><br>

It doesn't actually involve &quot;large numbers of special cases&quot;,

there are a rather small percentage of demonstrable problems in the

symbol/punctuation area. What we could say is that there is general

consensus that removing all but letters, digits, numbers, and marks (with

some exceptions) causes little damage in terms of backwards

compatibility, and does remove some problematic characters like fraction

slash.</blockquote><br>

Correct, but this depends on the sign coding system. This should be

noted.<br><br>

<blockquote type=cite class=cite cite="">Issues-17.<br><br>

<pre>&nbsp;&nbsp; For example, an essential

&nbsp;&nbsp; element of the ASCII case-mapping functions is that

&nbsp;&nbsp; uppercase(character) must be equal to

&nbsp;&nbsp; uppercase(lowercase(character)).

</pre><font face="Courier New, Courier"></font>Remove or rephrase. It is

a characteristic, but not an essential one. In fact, case mappings of

strings are lossy; once you lowercase &quot;McGowan&quot;, you can't

recover the original.</blockquote><br>

In DNS I understand that it is essential. We are constantly confronted to

this ambiguity when considering ASCII as a special case character set.

<br><br>

<blockquote type=cite class=cite cite="">Issues-18.<br><br>

<pre>&nbsp;&nbsp; o&nbsp; Unicode names for letters are fairly intuitive,

recognizable to

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; uses of the relevant script, and

unambiguous.&nbsp; Symbol names are

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; more problematic because there may be no

general agreement on

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; whether a particular glyph matches a

symbol, there are no uniform

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; conventions for naming, variations such as

outline, solid, and

</pre><font face="Courier New, Courier"></font>Actually, the formal

Unicode names are often far from intuitive to users of the relevant

script. That's because the constraints of using ASCII for the name, to

line up with ISO standards for character encodings.<br><br>

This section is not really needed. The use of I&lt;heart&gt;NY.com is not

really problematic; the main justification for removing it is that we

don't think it is needed (and has not been used much since IDNA was

introduced). Better to just stick with that.</blockquote><br>

This should be considered in comparison with possible universal sign

codes specifications as they should emerge (WSIS) in coming months and

years.<br><br>

<blockquote type=cite class=cite cite="">Issues-19<br><br>

<pre>&nbsp;&nbsp; 11.&nbsp; IANA Considerations

&nbsp;&nbsp; 11.1.&nbsp; IDNA Permitted Character Registry

&nbsp;&nbsp; The distinction between &quot;MAYBE&quot; code points and

those classified into

&nbsp;&nbsp; &quot;ALWAYS&quot; and &quot;NEVER&quot; (see Section 5)

requires a registry of

&nbsp;&nbsp; characters and scripts and their categories.&nbsp; IANA is

requested to

</pre><font face="Courier New, Courier"></font>Expecting an IANA registry

to maintain this is setting it up for failure. If this were to be done,

precise and lengthy guidance as to the criteria for removing characters

(moving to NEVER) would have to be supplied, because of the irrevocable

nature of this step. The odds of a registry being able to perform this

correctly are very small.<br><br>

The best alternative would be to simply have all the non-historic scripts

have the same status in

<a href="http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-03.txt">

<i>draft-faltstrom-idnabis-tables-03.txt</a></i>, by moving the

non-historic scripts to the same status as Latin, Greek, and

Cyrillic.<br>

The second best would be to have the Unicode consortium make the

determinations (and take the heat for objections).</blockquote><br>

Full agreement for the second proposition. The disrespected RFC 4646

Review mechanism shows the difficulty for the IANA. The probable

architectural changes due to the multilateralisation of the network

referents would only increase the difficulty. But up to now, you were the

one opposing the idea and wanting to use the IANA.<br>

jfc<br>

</body>

</html>