I think this does expose an issue that needs discussion. There are two types of stability that could be guaranteed.<br><br>1. Once a character is encoded, the property value (true or false) MUST never change.<br>2. Once a character is given the property value of true, its value MUST never change to false. An encoded character SHOULD not change from false to true, unless a strong case can be made for it.
<br><br>For both of them we have the key requirement for stability, that once a string qualifies as being valid, it stays valid forever. <br><br>However, #1 might be a bit too restrictive. If we currently say that character X has the value false, but there is an issue if for some reason we find out that that character is needed for some orthography of a language in, say, the Congo. People who think that #2 is not sufficient might present some scenarios where it could cause a problem (I can't think of any myself).
<br><br>Mark<br><br><div><span class="gmail_quote">On 1/31/07, <b class="gmail_sendername">Kenneth Whistler</b> <<a href="mailto:kenw@sybase.com">kenw@sybase.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Harald wrote:<br><br>> --On 14. desember 2006 18:24 -0800 Kenneth Whistler <<a href="mailto:kenw@sybase.com">kenw@sybase.com</a>> wrote:<br>><br>> > The *table* itself should unambiguously be defined as<br>
> > the list of characters appropriate for inclusion in<br>> > IDNA. IDNAInclusion.txt (or whatever name you like).<br>><br>> And how often do you believe this table would change?<br>><br>> Once a month?
<br>> Once a year?<br>> Once a decade?<br>><br>> I think I disagree violently with you, but that is because I understand you<br>> as saying that the table would change "once a decade".<br><br>Well, that understanding is wrong, I am afraid. I neither said
<br>that nor implied it.<br><br>My objection to the way Patrik was constructing the table is<br>that by making it multi-state, the table itself is more<br>complicated, more difficult to implement, and its status becomes<br>
more ambiguous and problematical for people attempting to<br>understand and implement it.<br><br>The statement of the IDNA nameprep, however it gets worked out<br>in detail, is going to need an inclusion table. Both the<br>
statement of the algorithm and the implementations of it<br>are easier, if the table is simply constructed as a binary<br>property representation -- rather than trying to build anticipation<br>of future decisions that *might* be made about some characters
<br>(but we don't know for sure yet which ones) being added into the<br>table. That just makes for head-scratching in implementation.<br><br>> If it changes once a<br>> month, the "unambiguous" table is a thin illusion papered over a tri-state
<br>> model.<br><br>First, it isn't going to change once a month. You know that,<br>so I don't see the point in raising it as a red herring.<br><br>Mark has provided the relevant timing history regarding changes
<br>to the *repertoire* of Unicode versions which could, in principle,<br>impact the list of characters appropriate for the IDNA inclusion<br>table.<br><br>Second, I thought one of the points here was to get the IDNA<br>nameprep spec out of the business of having to be updated
<br>every time the UTC and WG2 add some characters to Unicode<br>and ISO/IEC 10646. You accomplish that by publishing a specification<br>that defines the inclusion table by reference to a specific<br>character property for that purpose -- as we have discussed
<br>at some length now.<br><br>I just pushed up IDNPermitted.txt to demonstrate what the<br>documentation for such a binary character property could (and<br>probably would) look like, if published as part of the Unicode<br>
Character Database. The property is then easy to refer to<br>and easy to implement.<br><br>The eventual RFC for IDNAbis, rather than including some<br>long table definition that has to be maintained by<br>periodic revsions, can say something more or less like
<br>the following, in toto:<br><br><br> The inclusion table referred to in Step X of nameprep<br> is defined as all Unicode characters having the<br> property IDN_Permitted, as defined by the Unicode<br> Character Database. [UCD]
<br><br> Note: Some characters may be added to the repertoire<br> of characters with the IDN_Permitted property in the<br> future, as additional characters are added to the<br> Unicode Standard. This would be the case, for example,
<br> when additional minority scripts are added to the<br> standard. However, the maintenance of the IDN_Permitted<br> property is bound by the stability guarantee that<br> once a character is assigned that property, the property
<br> can never be removed from the character. In other<br> words, the inclusion table may grow, but once a<br> character is in the table, it can never be removed.<br><br>Or words to that effect. Wordsmith as required, but basically
<br>that is all that the specification would need. You get<br>your stability, your flexibility, and your definition by<br>reference, and you never have to go back and revise and<br>version the RFC for IDNA nameprep again to deal with
<br>Unicode versioning.<br><br>Now you or other people in the IETF may not believe in a<br>stability guarantee. But that it a matter of trust,<br>personalities, policies and procedures. We'll just have<br>to get on with working on those issues, I guess.
<br><br>The fundamental issue seems to be that some in the IETF are very<br>uncomfortable dealing with a character encoding standard<br>like the Unicode Standard (and ISO/IEC 10646) that keeps<br>changing and expanding over time -- and doesn't stay
<br>conveniently pinned down like ISO 8859-1 has done.<br>That is, however, an inconvenient but unavoidable fact of<br>life. Unicode is here, it is the backbone of systems and<br>the internet now, and it *isn't* going to stop changing
<br>for at least another decade yet. For everyone digging in<br>their heels and trying to prevent it from changing right<br>now, there is another community out there desperately trying<br>to ensure that *their* characters get added to the universal
<br>character set before the people trying to freeze it get their<br>way.<br><br>And from my point of view, trying to encapsulate and<br>control that unease about change in the RFC for<br>IDNAbis, with a multi-state table that worries about
<br>defining the set of "pending" characters, is just a<br>diversion from coming to closure on a working specification<br>for IDNAbis.<br><br>--Ken<br><br>><br>> Harald<br>><br>><br>>
<br><br>_______________________________________________<br>Idna-update mailing list<br><a href="mailto:Idna-update@alvestrand.no">Idna-update@alvestrand.no</a><br><a href="http://www.alvestrand.no/mailman/listinfo/idna-update">
http://www.alvestrand.no/mailman/listinfo/idna-update</a><br></blockquote></div><br><br clear="all"><br>-- <br>Mark