New version, draft-faltstrom-idnabis-tables-02.txt, available

Tue Jun 19 21:35:44 CEST 2007

On 6/13/07, Harald Tveit Alvestrand <harald at alvestrand.no> wrote:
>
> The intent of "MAYBE YES" and "MAYBE NO" was:
>
> - ALWAYS: We guarantee that these codepoints will be permitted in IDNs (at
> this level of the standard).
> - NEVER: We guarantee that these codepoints will never be permitted in
> IDNs

...

This is precisely the kind of information that belongs in Patrik's draft.
Without a model of the intended usage, it is impossible to assess the
structure of the document. At least with this kind of information we can
begin to sensibly discuss the pros and cons of the changes over what we had
last December.

However, it needs even more background information and justification.
Without knowing what you and Patrik meant by "Stable", it is also impossible
to assess how scripts should be assigned to that category. After all, Thai
is just as stable as Latin, if not more so, depending on what is meant, yet
you exclude Thai but keep Latin. I was originally guessing that what you
mean by "stable" is "has no characters that are problematic for IDN", but in
that case, Latin itself is not stable because of the potential confusability
between "1" and "l". Or, if confusability is not the issue, then you really
need to have some examples of what exactly are the problems you are trying
to prevent.

That is, you need to provide more information as to what you intend by
"stable", with specific examples of scripts that you consider stable and
why, and scripts that you consider unstable and why. Failing that, I don't
see why every non-archaic script would not be stable, thus obviating the
need for your new 4 distinctions instead of the 3 that we had up until a few
days ago.

Now, I found a hint of what may be at issue in Patrik's further response:

> Secondly, the ALWAYS and NEVER property values are only allowed on
unproblematic scripts if we have a rough consensus that the
codepoints will not move from ALWAYS to NEVER or vice versa given the
algorithm we have to calculate the property value itself.

If *that* notion of stability is all that is being talked about, then it is
very easy, and we have done it with a number of Unicode properties. Define
the following:

Grandfathered_Always to be all characters that were Always under any
previous Unicode version back to some base level (say 5.0)
Grandfathered_Never to be all characters that were Never under any previous
Unicode version back to some base level (say 5.0)

Then modify the end of my message of June 13 to be:

Then derive the following sets:

   - Always = Grandfathered | (Favored & Functional) |
   Grandfathered_Always
   - Maybe_Yes = !Favored & Functional & !(Always | Grandfathered_Never)
   - Maybe_Not = (Archaic | (!Favored & !Functional)) & !(Always |
   Grandfathered_Never)
   - Never = everything else

Harald said a bit later:

> So far, I've seen a lot of hand-wringing about the list of scripts being
too short, the list of scripts being Europe-centric, the arguments for
the list of scripts being too weak, the list of scripts including
worrisome characters (IPA), but I have NOT seen ANY flat statement
"script XXX is unproblematic and should be included".

My response would be:

Each script other than the archaic ones is no more problematic overall than
the Latin, Greek, and Cyrillic you have already included. Thus if you
include Latin, Greek, and Cyrillic, you should include them:

Arab    Arabic
Armn    Armenian
Bali    Balinese
Beng    Bengali
Bopo    Bopomofo
Buhd    Buhid
Cans    Canadian_Aboriginal
Cher    Cherokee
Cyrl    Cyrillic
Deva    Devanagari
Ethi    Ethiopic
Geor    Georgian
Grek    Greek
Gujr    Gujarati
Guru    Gurmukhi
Hang    Hangul
Hani    Han
Hebr    Hebrew
Hira    Hiragana
Kana    Katakana
Khmr    Khmer
Knda    Kannada
Laoo    Lao
Latn    Latin
Limb    Limbu
Mlym    Malayalam
Mong    Mongolian
Mymr    Myanmar
Nkoo    Nko
Orya    Oriya
Sinh    Sinhala
Tale    Tai_Le
Talu    New_Tai_Lue
Taml    Tamil
Telu    Telugu
Tfng    Tifinagh
Thaa    Thaana
Thai    Thai
Tibt    Tibetan
Yiii    Yi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070619/b96ef1b6/attachment.html