Unicode 5.2 -> 6.0

Thu Oct 14 07:46:38 CEST 2010

[This has also been sent to IAB]

We will shortly see Unicode 6.0 released. This implies we will see a new list of derived property values be calculated based on the algorithm in RFC 5892.

There are incompatible changes in three codepoints:

1. The following two that go to PVALID from DISALLOWED:

U+0CF1 KANNADA SIGN JIHVAMULIYA
U+0CF2 KANNADA SIGN UPADHMANIYA

This because they go from General Category So to Lo.

2. This moves from PVALID to DISALLOWED:

U+19DA NEW TAI LUE THAM DIGIT ONE

It has changed GeneralCategory from Nd to No.

In both cases the rule that create the difference is in section 2.1 of RFC 5892 LetterDigits(A):

> 2.1.  LetterDigits (A)
> 
>   A: General_Category(cp) is in {Ll, Lu, Lo, Nd, Lm, Mn, Mc}

There are two alternatives for the IETF:

A) Accept the change and stay aligned with Unicode

The changes made are all "bugs" in the tables that are resolved. The most troublesome of the three codepoints would be U+19DA as that goes from PVALID to DISALLOWED, as that potentially would make domain names registered with that codepoint be invalid.

B) Add these three as exceptions for backward compatibility.

One can add the three (or subset thereof) to section 2.7 in an updated version of RFC 5892:

> 2.7.  BackwardCompatible (G)
> 
>    G: cp is in {}

This set is in RFC 5892 empty, but characters can be added. Characters with explicit derived property value. This would require an IETF action.

The wg after long discussions came to the rough consensus that IETF action should be needed for update of the backward compatibility list because we should be forced to discuss whether alternative (A) or (B) above should be used. At least the first couple of times. In a future RFC we could say that IANA is to hold a table with the backward compatibility list, and for example allow an appointed expert take care of the update of that list.

I was personally in favour of needing an IETF discussion if/when we had this issue between Unicode versions. And here we are. I welcome the discussion.

My personal suggestion is that if noone can show that domain names are in fact registered or used with U+19DA according to IDNA2008, IETF should accept the incompatible changes, and stay completely aligned with Unicode 6.0.

   Patrik - liaison from IETF to Unicode Consortium