Patrik, I&#39;m afraid that somehow we have a miscommunication. We haven&#39;t ever been saying that all the properties that you&#39;ve considered in tables-xx.txt are stable. What we 

have been saying is that once the IETF comes up with a set of rules that define an IDN property, we can and will commit to providing the mechanism to stabilize that IDN property. Nobody has been saying that all of the underlying properties will themselves be stable.

<div class="gmail_quote"> This is a bit tricky, so please bear with me. Because I have not communicated this effectively so far, please read this over carefully and let me know where you have questions or I am unclear. I will letter these items for reference.

A. There is always a tension between having properties be stable and having them be as accurate as possible.  <ol><li>Application X wants to get the most accurate information about characters with property X, and doesn&#39;t care about compatibility.

</li><li>Application Y wants to property X in some special way (such as identifiers) that requires absolute backwards compatibility. If the Unicode consortium finds out that a character has different properties, that doesn&#39;t matter. Compatibility swamps accuracy.

</li></ol>Because of that, all and only the properties on

<a href="http://www.unicode.org/policies/stability_policy.html" target="_blank">http://www.unicode.org/policies/stability_policy.html</a> are guaranteed to be stable. That

page sets out exactly how the properties are stable. The Unicode consortium, however, does have a tool that it has used successfully for many years to guarantee absolute stability, while allowing for fixes to underlying properties.

</i> <br><br>B. Let&#39;s suppose IETF wants to define IDN_ALWAYS as being characters according to a given formulation, such as the following (for brevity, X means the set of characters having the property X) <br><ul><li>

IDN_ALWAYS = ((X + Y + Z) - W) + V.</li></ul>The properties X, Y, Z, W, and V the underlying properties for IDN_ALWAYS. The Unicode consortium can supply absolute stability for IDN_ALWAYS in the following way. In each version of Unicode, the consortium would commit to defining the property:

<div style="margin-left: 40px;">Other_IDN_ALWAYS </div> to be all those characters that were in IDN_ALWAYS in the previous version, but would not be in the current version according to the formulation.

Then the IETFs formulation becomes: <ul><li>IDN_ALWAYS = (((X + Y + Z) - W) + V) + Other_IDN_ALWAYS</li></ul>That provides absolute stability across versions. Given a version of Unicode V and the IETF&#39;s formulation, anyone can calculate IDN_ALWAYS for version V, and it will always include all characters that were in IDN_ALWAYS for version V-1.

Nobody needs to access multiple versions of Unicode to make this calculation. As far as I understand them, I believe this satisfies all of your requirements for stability. C. Now, what we would also do in Unicode would be to provide, in each version of Unicode, what is called a &quot;derived property&quot;, where we go ahead and compute the tables for IDN_ALWAYS. This is simply a convenience to users, since the vast majority don&#39;t want to do the computation themselves; they just want to get the values. But someone always could.

<br><br><br>D. If you want to see an example of this, look at the Unicode identifiers over the years. Those use two properties (like C or Java, some characters cannot be at the start of identifiers).<br><br><span>

ID_Start = [[:L:][:Nl:][:Other_ID_Start:]] ID_Continue = [[:L:][:Nl:][:Mn:][:Mc:][:Nd:][:Other_ID_Continue:]] // this uses the POSIX syntax, whereby [:L:] is the set of all code points C such that general_category(C) = Letter.

// one can of course use other syntax, like Perl&#39;s \p{L}, and so on. We formalized and stabilized them in Unicode 3.0, in 1999. If you search for Other_ID_Start and Other_ID_Continue in the following files, you&#39;ll find that over the course of the eight years since then, we&#39;ve added a handful of characters to the grandfathering categories Other_... so as to maintain backwards compatibility with each and every release.

<br><br><a href="http://www.unicode.org/Public/4.0-Update/PropList-4.0.0.txt" target="_blank">http://www.unicode.org/Public/4.0-Update/PropList-4.0.0.txt</a><br><a href="http://www.unicode.org/Public/4.1.0/ucd/PropList.txt" target="_blank">

http://www.unicode.org/Public/4.1.0/ucd/PropList.txt

</a><br>

<a href="http://www.unicode.org/Public/5.0.0/ucd/PropList.txt" target="_blank">http://www.unicode.org/Public/5.0.0/ucd/PropList.txt</a><br><a href="http://www.unicode.org/Public/5.1.0/ucd/PropList-5.1.0d20.txt" target="_blank">

http://www.unicode.org/Public/5.1.0/ucd/PropList-5.1.0d20.txt

</a> (the current beta)<br><br>We have a set of tools that we run over each release to verify compatibility, and we have a beta period of several months for each release where others can run external own tools as well.<br>

E. While I do believe that the Unicode consortium would be best placed to update the categorization of characters over time, based on the broad internationalization expertise of its members, that is an orthogonal issue. The IETF could decide that it wants some other group to do that -- that does not affect the consortium&#39;s commitment to provide for stability of IDN. It would, of course, require synchronization between the groups, much like the extremely successful working relationship between the consortium and the ISO subcommittee responsible for ISO 10646.

<br><br>I&#39;m hoping this makes sense to you now.<br><br>Mark<div><div></div><div class="Wj3C7c"><br><br><div class="gmail_quote">On Dec 16, 2007 2:26 AM, Patrik Fältström &lt;<a href="mailto:patrik@frobbit.se" target="_blank">

patrik@frobbit.se</a>&gt; wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Before we go into the details of the comments of this document (thanks<br>for those), I have to rise an overall issue that has been boiling for<br>a while, and that has to do with stability of the properties defined<br>by the Unicode Consortium. Reason why this discussion is needed before

<br>I start working on the overall issues you rise here will hopefully be<br>more clear later.<br><br>As data is stored in databases (like DNS) for a very very long time,<br>anything that compare codepoints based on some property value MUST

only use the derived property values that has the backward compatibility features you describe. Backwards compatible as in &quot;if codepoint a have property x in version N of unicode, it should also have that property in version N+1&quot;.

<br><br>In early discussions on stability, you from UTC said that things WILL<br>BE STABLE, and you personally have said so to the IETF several times.<br>We have displayed the algorithms to you several times, and we have

<br>

also said the calculations in the IDNAbis document will be based on<br>base properties and not derived properties -- so that EVERYONE can<br>easily calculate the derived value if they have the need for it. We<br>have also agreed that things will NEVER move from ALWAYS or NEVER, and

<br>you have also been part of the discussion regarding Cyrillic and Latin<br>(as those where said being &quot;known&quot;).<br><br>You come now and say revoke so many things you and other UTC people<br>have stated in the IDNAbis discussion that I do not know how to

continue the work. The rules will never be based on derived property values. People MUST be able to calculate the ALWAYS etc properties given the CURRENT Unicode distribution. The overall goal with IDNAbis is to be independent of Unicode Version.

<br>This implies it MUST be possible for anyone that have &quot;the<br>distribution of Unicode&quot; to compute the value of the derived property<br>that tell the status in IDNAbis. An alternative would be to have the<br>

derived property &quot;just&quot; appearing as a table that noone but a closed<br>group can compute (or codepoints end up there in an ad-hoc based<br>mechanism).<br><br>This do though imply that the base properties the algorithm is based

<br>upon are stable. At least stable in the cases where IDNAbis is to<br>ensure stability. And with this I imply for example &quot;codepoints that<br>are in ALWAYS will never be removed from ALWAYS&quot; etc.<br><br>This in turn imply the calculations and the properties that lead to a

<br>codepoint end up in ALWAYS or NEVER will never change in such a way<br>that the calculations lead to a different result in the future.<br><br>This is why Cyrillic, Latin etc where selected as pointers to<br>codepoints that are believed to be stable so that we dare(!) to put

<br>codepoints in those scripts in the ALWAYS and NEVER categories. For<br>other scripts we see changes between the versions of Unicode still.<br>Changes that are large enough so they _CAN_ have implications on the<br>domain names that are already stored in the DNS database in the form

<br>of registered domain names.<br><br>The reason IETF require stability, as we have explained before, is<br>that if a is registered as a domain name, a lookup for a should always<br>give a match in the future. One must be able to use the domain name

<br>one have registered for all times in the future. This is what IDNAbis<br>is concentrating on. Ensuring that if a is in ALWAYS and registered in<br>DNS, it should stay there.<br><br>If we then include that also b should be stable because f(b)=a (case

<br>folding etc) then we have a much larger problem. How can we ensure<br>that b will continue to have the properties needed, and how can be<br>ensure that the function f(x) is stable by itself?<br><br>I have heard you say many times when we get this far in the discussion

<br>&quot;but that is no problem&quot;. You even say below that MAYBE YES should be<br>removed, as things very easily can be added to the ALWAYS category.<br><br>But that is not a statement I agree with, and let me explain why. I

<br>have two points here to make:<br><br>(1) There is currently a suggestion on the Unicore mailing list to<br>move a codepoint from script cyrillic to inherited. This (if we would<br>have taken inherited into account in the tables document) would move

<br>the codepoint from ALWAYS to CONTEXT according to my preliminary<br>thinking. But that is not the point. The point is that suddenly I am,<br>and many people should, be very very very afraid of including cyrillic<br>script in the list of codepoints that are stable enough to have things

<br>in the ALWAYS category. Removing Cyrillic from there have implications<br>on the ability to register codepoints using cyrillic as IDN domain<br>names, and I am pretty sure that change will be discussed at the next<br>

meeting of the Internet Governance Forum. Russia have, as I hope you<br>know, very strong feelings regarding use of Cyrillic &quot;on the Internet&quot;.<br><br>That a discussion even exist to change any properties regarding a

<br>codepoint that is part of the cyrillic script surprises me given the<br>statements you have made regarding stability.<br><br>(2) You have in mail to me said that properties not at all are stable.<br>This is for me something that is completely orthogonal to statements

<br>similar to &quot;it is easy for people knowing scripts to add more things<br>to ALWAYS&quot;. You have further explained that stability is ensured by<br>defining a new derived property in the following way:<br><br>Say codepoint a have property x. As x is not a stable property (as no

<br>properties are stable) one have a derived property is_or_has_been_x<br>that all codepoints have either have or have had that property has.<br>This implies the codepoint a might no longer have property x, but will<br>

have property is_or_has_been_x. If we now base the IDNAbis tables on

<br>this derived property three things happens:<br><br>(a) It is impossible for people outside unicode consortium to<br>calculate the tables, as one can not know what codepoints have (since<br>version N of Unicode) had the property value x, and because of that it

is impossible to know what codepoints have property value is_or_has_been_x. I.e. only people with inside information on Unicode Consrtium issues can make the calculations resulting in (various degrees of) stability.

(b) If algorithms like IDNAbis have to have stability, people have to base algorithms, sorting etc on is_or_has_been_x and not x, and then the change of codepoint a to remove x from it has no value in reality.

There must be a reason why x was removed from A. But if is_or_has_been_x is what is used, that change is just void. So why changing? What will interoperability be between applications using x and ones using is_or_has_been_x?

<br><br>This imply people will use the first property value ever assigned to<br>the codepoint, and that changes are not interesting at all. The real<br>property values will diverge from the derived ones, but the derived<br>

ones are still the most important ones for historical data.<br><br>This to me imply that changing property values is completely useless,<br>part from making this a real mess.<br><br>(c) All of these claims that something is stable but not stable lead

<br>me to the conclusion that IDNAbis property can not be calculated on<br>the properties Unicode Consortium has. Instead it has to be based on<br>derived properties like<br>is_or_has_been_x, or rather, codepoints have to be hand picked to

<br>ensure stability.<br><br>And this open up the question whether Unicode codepoints should be<br>used at all. IETF could as well use codepoints from ISO 10646 as the<br>properties Unicode define do not give any extra value, and then this

<br>discussion can concentrate on what to do with the codepoints. IANA<br>then hold a table of the properties based on ISO 10646.<br><br>So, before moving forward with IDNAbis, it might be that IETF will<br>need a statement from UTC what properties will be stable in the

future, and for what codepoints. Only that data is something the algorithms in the table document can be based upon. I guess because of this the ball is again on your (as in unicode consortium) side of the ballpark.

<br><br>In the meantime, I work on the table document and the good comments,<br>including the ones from you Mark.<br><br> &nbsp; &nbsp;Patrik<br><div><div></div><div><br>On 14 dec 2007, at 04.45, Mark Davis wrote:<br>

<br>&gt; <a href="http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-03.txt" target="_blank">http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-03.txt</a><br>&gt; Overall<br>&gt; Comments:

<br>

&gt;<br>&gt;<br>&gt; Tables-1.<br>&gt;<br>&gt; There is no operational difference between MAYBE YES and MAYBE NO,<br>&gt; and no<br>&gt; characters that are in the latter. This distinction is really only<br>&gt; meaningful as internal tracking information inside whatever group

<br>&gt; controls<br>&gt; the future allocation of characters and should not appear here. (See<br>&gt; also<br>&gt; Ken&#39;s email and trail under &quot;Table issues (was: Re: IDNAbis<br>&gt; documents)&quot;<br>&gt;<br>

&gt; Even further, MAYBE YES should not exist at all: a day or two of &gt; work by &gt; script experts would be enough to move the vast majority of the &gt; current &gt; &#39;MAYBE YES&#39; to the ALWAYS category.

<br>&gt;<br>&gt; Tables-2.<br>&gt;<br>&gt; There is a preference for Latin, Greek, Cyrillic, and Han which has no<br>&gt; principled basis. In particular, Latin, Cyrillic, and Han are some<br>&gt; of the<br>&gt; most complicated scripts: Latin and Cyrillic, since they ar used to

<br>&gt; write a<br>&gt; huge number of languages with a large number of variant characters,<br>&gt; and Han<br>&gt; because of the history of character variations. Many, many scripts<br>&gt; are less<br>&gt; problematic than Latin or Cyrillic, and there is no reason to favor

<br>&gt; Cyrillic<br>&gt; over say Armenian; it also gives the appearance of Eurocentrism<br>&gt; where none<br>&gt; is intended.<br>&gt;<br>&gt;<br>&gt; From an old email:<br>&gt;<br>&gt; &quot;No reason is given for the focus on only European scripts; and that

<br>&gt; focus<br>&gt; will surely raise suspicions in many circles. While I&#39;m sure that the<br>&gt; restriction to European languages is just because those are the ones<br>&gt; the<br>&gt; small group of authors is familiar with, it will not be received

&gt; well. If &gt; &quot;we the community&quot; have &quot;experienced that a number of scripts have &gt; issues &gt; that are not resolved&quot;, then those problems should be enumerated &gt; *explicitly*, not hidden away.

<br>&gt;<br>&gt; The situation might be different if we were starting from zero; but<br>&gt; we are<br>&gt; not. We already have an IDNA system that works for a great many<br>&gt; people. And<br>&gt; while there are security problems with it, those are well known and

<br>&gt; vendors<br>&gt; are dealing with them. Moreover, of the problems that IDNAbis<br>&gt; solves, they<br>&gt; are just the easy ones -- the harder ones are ones like the<br>&gt; &quot;<a href="http://paypal.com" target="_blank">

paypal.com</a>&quot;<br>&gt; case, which the current suggestion for IDNAbis doesn&#39;t touch. So it<br>&gt; feels<br>&gt; like we are looking at a proposal that:<br>&gt;<br>&gt; 1. doesn&#39;t actually help much with the practical problems that

<br>&gt; people face<br>&gt; 2. solves the easy problems, but not the hard ones; so people have to<br>&gt; essentially do the work anyway<br>&gt; 3. and removes much of the functionality, except for some favored<br>&gt; groups:

&gt; Europe and the Americas&quot; &gt; &gt; Tables-3. &gt; &gt; The CONTEXT class should be heavily restricted, as per Ken&#39;s email, &gt; to only &gt; 2 characters (see &quot;Table issues (Part 3: CONTEXT)&quot; for details).

&gt; Moreover, &gt; the term Context is problematic: **many** characters are disallowed or &gt; allowed, depending on context. Even a-z are disallowed in a field &gt; that also &gt; contains RTL characters.

<br>&gt;<br>&gt; Tables-4.<br>&gt;<br>&gt; The list of historic scripts is very outdated. See<br>&gt; <a href="http://www.unicode.org/reports/tr31/tr31-8.html#Specific_Character_Adjustmentsfor" target="_blank">http://www.unicode.org/reports/tr31/tr31-8.html#Specific_Character_Adjustmentsfor

</a><br>&gt; more details. The characters in Table 3 should also be reviewed as<br>&gt; possible exceptions.<br>&gt;<br>&gt; Tables-5.<br>&gt;<br>&gt; Key to the success of this is the group that determines the future<br>

&gt; allocation of characters. It must be very clear precisely what the<br>&gt; grounds<br>&gt; are for removing characters (moving from MAYBE to NEVER); otherwise<br>&gt; there<br>&gt; will be never-ending battles over individual characters. (Frankly, I

&gt; believe &gt; that the correct course of action would be to disallow the historic &gt; scripts &gt; for now, but allow the characters in all other scripts, with very few &gt; exceptions.) &gt; &gt; Tables-6.

<br>&gt;<br></div></div>&gt; Like draft-alvestrand-idna-bidi-01.txt&lt;<a href="http://www.ietf.org/internet-drafts/draft-alvestrand-idna-bidi-01.txt" target="_blank">http://www.ietf.org/internet-drafts/draft-alvestrand-idna-bidi-01.txt

</a><br><div>&gt; &gt;,<br>&gt; there should be at least one example motivating every case where a<br>&gt; class of<br>&gt; characters is removed (this might be in one of the other documents<br>&gt; instead

<br>&gt; of here).<br>&gt;<br>&gt; Tables-7.<br>&gt;<br>&gt; The entire description of the process is far too complicated for<br>&gt; what is, at<br>&gt; core, a relatively simple process. It is further obfuscated by<br>

&gt; referring to

<br>&gt; classes of characters by a letter category instead of a mnemonics.<br>&gt;<br>&gt; Take the following from<br></div>&gt; draft-faltstrom-idnabis-tables-03.txt&lt;<a href="http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-03.txt" target="_blank">

http://www.ietf.org/internet-drafts/draft-faltstrom-idnabis-tables-03.txt</a><br><div><div></div><div>&gt; &gt;<br>&gt;<br>&gt; &nbsp; &nbsp; &nbsp;* &nbsp;If the codepoint does not appear in any of the categories B<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; (Section 

2.1.2), C (Section 2.1.3), D (Section 2.1.4), E<br>&gt; &nbsp; &nbsp; &nbsp; &nbsp; (Section 2.1.5) or F (Section 2.1.6), the value is ALWAYS.<br>&gt;<br>&gt; That formulation is completely opaque. I&#39;d strongly recommend for<br>&gt; transparency you reformulate this considerably. You could maintain

<br>&gt; part of<br>&gt; the structure that you have, if you wanted, by consistently using<br>&gt; mnemonics<br>&gt; instead of Sections.<br>&gt;<br>&gt; That is, give ,meaningful names to each Category in Section 2, such

<br>&gt; as:<br>&gt;<br>&gt; A =&gt; Language-Characters<br>&gt; B =&gt; Unnormalized<br>&gt; C =&gt; Ignorable<br>&gt; D =&gt; Historical-Scripts<br>&gt; E =&gt; Disallowed-Blocks<br>&gt; ...<br>&gt;<br>&gt; The formulation can then be something like the following. (This is not

<br>&gt; precisely equivalent to your formulation, which I found difficult to<br>&gt; follow<br>&gt; -- it is the style of presentation that I&#39;m focusing on).<br>&gt;<br>&gt; Use the following procedure to determine the IDNA-Property of any

<br>&gt; code point<br>&gt; cp. Proceed through the rules, and return a value at the first that<br>&gt; applies.<br>&gt;<br>&gt; Exceptions<br>&gt; 1a. If cp is in Exceptional-Always, return Always<br>&gt; 1b. If cp is in Exceptional-Never, return Never

<br>&gt; 1c. If cp is in Exceptional-Maybe, return Maybe<br>&gt;<br>&gt; Functional Exclusions<br>&gt; 2. Else if cp is in Unnormalized, return Never<br>&gt; 3. Else if cp is in Not-Case-Folded, return Never<br>&gt; 4. Else if cp is in Ignorable, return Never

<br>&gt;<br>&gt; Usage Exclusions<br>&gt; 5. Else if cp is in Historical-Scripts, return Never<br>&gt; 6. Else if cp is in Disallowed-Blocks, return Never<br>&gt;<br>&gt; LMN Inclusion<br>&gt; 7. Else if cp is in Language-Characters, return Maybe

<br>&gt;<br>&gt; Exclude everything else<br>&gt; 8. Else return Never<br>&gt;<br>&gt; Note: Exceptional-Always would contain your Category H Always<br>&gt; characters,<br>&gt; plus grandfathered Always characters, plus a-z, 0-9, -; Exceptional-

&gt; Maybe &gt; would add the Category H Maybe characters, and so on. The mechanism &gt; already &gt; described in email for providing perfect stability would be to add &gt; characters, where necessary, to these classes.

<br>&gt;<br>&gt; Details:<br>&gt; Tables-8.<br>&gt;<br>&gt; &nbsp; &nbsp; &nbsp;a character is never removed from<br>&gt; &nbsp; &nbsp; &nbsp;it unless it is removed from Unicode.<br>&gt;<br>&gt; This is not necessary. If you really have to have it, then add

&gt; &quot;(however, &gt; the Unicode stability policies expressly forbid this)&quot; &gt; &gt; &gt; Tables-9. &gt; &gt; Re. Appendix A. There seem to be some errors in the generation of this &gt; table. The code point range should be &quot;0x0000 - 0x10FFFF&quot;.

<br>&gt;<br>&gt;<br>&gt; Tables-10<br>&gt;<br>&gt;<br>&gt; The derivation of the table did not correctly distinguish<br>&gt; *unassigned* code<br>&gt; points from *noncharacter* code points. Unassigned code points are<br>

&gt; &quot;&lt;reserved&gt;&quot; and are available for future encoding of characters,<br>&gt; whereas<br>&gt; noncharacter code points are *not* &quot;&lt;reserved (for future<br>&gt; assignment)&gt;&quot; --<br>&gt; they are designated functions, constitute a kind of internal private

&gt; use, &gt; and are disallowed for interchange. (See Table 2-3, TUS 5.0, p. 27.) &gt; If PUA &gt; code points (e.g. U+E000..U+F8FF) are to be NEVER in this table, &gt; then the &gt; noncharacters must be NEVER, rather than UNASSIGNED.

<br>&gt;<br>&gt; Tables-10a<br>&gt;<br>&gt;<br>&gt; In general, having this Appendix A listing include UNASSIGNED code<br>&gt; points is<br>&gt; both distracting (from the other, more meaningful values) and an<br>&gt; error-prone

<br>&gt; reduplication of effort. The listing of gc=Cn values is already<br>&gt; available<br>&gt; directly from:<br>&gt;<br>&gt; <a href="http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt" target="_blank">

http://www.unicode.org/Public/UNIDATA/extracted/DerivedGeneralCategory.txt</a><br>&gt;<br>&gt; And that file *does* make the distinction between true unassigned code<br>&gt; points and noncharacter code points (both of which are gc=Cn, but

<br>&gt; which<br>&gt; differ in the Noncharacter_Code_Point property [see PropList.txt].)<br>&gt; The<br>&gt; derivation for the IDN inclusion table needs to pay attention to<br>&gt; *both*<br>&gt; gc=Cn and Noncharacter_Code_Point=True. What *would* make sense is

&gt; for the &gt; Appendix listing to correctly identify the noncharacters as NEVER. &gt; The fact &gt; that it doesn&#39;t suggests that there is an error in the way the &gt; calculation &gt; is handling Category D.

<br>&gt;<br>&gt;<br>&gt; Tables-11<br>&gt;<br>&gt;<br>&gt; Another general issue with the document, table, and Section 3,<br>&gt; Calculation<br>&gt; of the Derived Property: The possible values of the IDN property still

<br>

&gt; include a value MAYBE NOT, but in fact the calculation has no branch<br>&gt; now<br>&gt; that assigns a MAYBE NOT value, and the table contains on MAYBE NOT<br>&gt; characters. Either the thinking about &quot;MAYBE NOT&quot; has changed, and the

<br>&gt; document hasn&#39;t caught up to that yet, or there is an error in how the<br>&gt; calculation has been set up. As it is now, nearly all of the &quot;MAYBE<br>&gt; NOT&quot;<br>&gt; values from the 01 version of this ID are now listed in the Appendix

<br>&gt; as<br>&gt; &quot;NEVER&quot;. As &quot;NEVER&quot;, they would be prohibited from any future<br>&gt; consideration<br>&gt; for IDN, which seems at odds with the tenor of the text describing<br>&gt; &quot;MAYBE<br>

&gt; NOT&quot;.<br>&gt;<br>&gt; Tables-12<br>&gt;<br>&gt;<br>&gt; Section 4. Codepoints states:<br>&gt;<br>&gt; &quot;The Categories and Rules defined in Section 2 and Section 3 apply<br>&gt; to all<br>&gt; assigned Unicode characters.&quot; In fact they also apply to

<br>&gt; *unassigned* code<br>&gt; points as well.<br>&gt;<br>&gt; The correct formulation would be:<br>&gt;<br>&gt; &quot;The Categories and Rules defined in Section 2 and Section 3 apply<br>&gt; to all<br>&gt; Unicode codepoints, assigned or unassigned.&quot;

<br>&gt;<br>&gt; [Note: the Unicode Standard systematically uses a space in the term<br>&gt; &quot;code<br>&gt; point&quot;, as well as for &quot;code unit&quot;, &quot;code position&quot;, &quot;code value&quot;,<br>&gt; etc. But

<br>&gt; given that this document uses &quot;codepoint&quot; everywhere, I&#39;m not<br>&gt; suggesting<br>&gt; that be changed. Nobody is going to be confused as to what the word<br>&gt; means.]<br>&gt;<br>&gt;<br>&gt; Tables-13

&gt; &gt; &quot;Once assigned to this category, a character is never removed from &gt; it unless &gt; it is removed from Unicode.&quot; &gt; &gt; The qualification &quot;unless it is removed from Unicode&quot; is vacuous.

<br>&gt; Since<br>&gt; Unicode 1.1, no character ever has been removed from Unicode, nor<br>&gt; will any<br>&gt; be -- in part because no character will ever be removed from ISO/IEC<br>&gt; 10646.<br>&gt;<br>&gt; So this is a quibble is a little like qualifying the definition of

&gt; ASCII LDH &gt; as &quot;{0061..007A, 0030..0039, 002D} and no characters will be removed &gt; from &gt; this definition unless they are removed from ASCII.&quot; &gt; &gt; So I suggest just removing the vacuous qualification.

<br>&gt;<br>&gt;<br>&gt; Tables-14<br>&gt;<br>&gt;<br>&gt; The grandfathering technique needs to be used so as to preserve<br>&gt; stability,<br>&gt; since characters may change script. (See the email trail under<br>&gt; &quot;Table issues

<br>&gt; (Part 2)&quot; for details).<br></div></div>&gt; _______________________________________________<br>&gt; Idna-update mailing list<br>&gt; <a href="mailto:Idna-update@alvestrand.no" target="_blank">Idna-update@alvestrand.no

</a><br>

&gt; <a href="http://www.alvestrand.no/mailman/listinfo/idna-update" target="_blank">http://www.alvestrand.no/mailman/listinfo/idna-update</a><br><br></blockquote></div><br><br clear="all"><br></div></div>-- <br><font color="#888888">

Mark

</font></div><br><br clear="all"><br>-- <br>Mark