New version, draft-faltstrom-idnabis-tables-02.txt, available

Tue Jun 19 22:39:54 CEST 2007

Sorry to be a pain with my lack of knowledge, but two questions:

1.  When you refer to script are you talking about Unicode Script or ISO
15924 Script or are they one and the same?  Define the use of script here
please.

2.  I can see the reason for not having an ALWAYS become a NEVER but surely
there is no reason for the vice versa if future applications can deal with
what was a NEVER it could become an ALWAYS. Or?

Best regards

Debbie Garside 

  _____  

From: idna-update-bounces at alvestrand.no
[mailto:idna-update-bounces at alvestrand.no] On Behalf Of Mark Davis
Sent: 19 June 2007 20:36
To: Harald Tveit Alvestrand
Cc: Patrik Fältström; idna-update at alvestrand.no
Subject: Re: New version, draft-faltstrom-idnabis-tables-02.txt, available

On 6/13/07, Harald Tveit Alvestrand <harald at alvestrand.no> wrote: 

The intent of "MAYBE YES" and "MAYBE NO" was:

- ALWAYS: We guarantee that these codepoints will be permitted in IDNs (at
this level of the standard).
- NEVER: We guarantee that these codepoints will never be permitted in IDNs 

...

This is precisely the kind of information that belongs in Patrik's draft.
Without a model of the intended usage, it is impossible to assess the
structure of the document. At least with this kind of information we can
begin to sensibly discuss the pros and cons of the changes over what we had
last December. 

However, it needs even more background information and justification.
Without knowing what you and Patrik meant by "Stable", it is also impossible
to assess how scripts should be assigned to that category. After all, Thai
is just as stable as Latin, if not more so, depending on what is meant, yet
you exclude Thai but keep Latin. I was originally guessing that what you
mean by "stable" is "has no characters that are problematic for IDN", but in
that case, Latin itself is not stable because of the potential confusability
between "1" and "l". Or, if confusability is not the issue, then you really
need to have some examples of what exactly are the problems you are trying
to prevent. 

That is, you need to provide more information as to what you intend by
"stable", with specific examples of scripts that you consider stable and
why, and scripts that you consider unstable and why. Failing that, I don't
see why every non-archaic script would not be stable, thus obviating the
need for your new 4 distinctions instead of the 3 that we had up until a few
days ago. 

Now, I found a hint of what may be at issue in Patrik's further response:

> Secondly, the ALWAYS and NEVER property values are only allowed on
unproblematic scripts if we have a rough consensus that the
codepoints will not move from ALWAYS to NEVER or vice versa given the
algorithm we have to calculate the property value itself.

If *that* notion of stability is all that is being talked about, then it is
very easy, and we have done it with a number of Unicode properties. Define
the following:

Grandfathered_Always to be all characters that were Always under any
previous Unicode version back to some base level (say 5.0)
Grandfathered_Never to be all characters that were Never under any previous
Unicode version back to some base level (say 5.0)

Then modify the end of my message of June 13 to be:

Then derive the following sets:

*	Always = Grandfathered | (Favored & Functional) |
Grandfathered_Always

*	Maybe_Yes = !Favored & Functional & !(Always | Grandfathered_Never) 

*	Maybe_Not = (Archaic | (!Favored & !Functional)) & !(Always |
Grandfathered_Never) 

*	Never = everything else

Harald said a bit later: 

> So far, I've seen a lot of hand-wringing about the list of scripts being
too short, the list of scripts being Europe-centric, the arguments for
the list of scripts being too weak, the list of scripts including 
worrisome characters (IPA), but I have NOT seen ANY flat statement
"script XXX is unproblematic and should be included".

My response would be:

Each script other than the archaic ones is no more problematic overall than
the Latin, Greek, and Cyrillic you have already included. Thus if you
include Latin, Greek, and Cyrillic, you should include them: 

Arab    Arabic
Armn    Armenian
Bali    Balinese
Beng    Bengali
Bopo    Bopomofo
Buhd    Buhid
Cans    Canadian_Aboriginal
Cher    Cherokee
Cyrl    Cyrillic
Deva    Devanagari
Ethi    Ethiopic
Geor    Georgian
Grek    Greek 
Gujr    Gujarati
Guru    Gurmukhi
Hang    Hangul
Hani    Han
Hebr    Hebrew
Hira    Hiragana
Kana    Katakana
Khmr    Khmer
Knda    Kannada
Laoo    Lao
Latn    Latin
Limb    Limbu
Mlym    Malayalam 
Mong    Mongolian
Mymr    Myanmar
Nkoo    Nko
Orya    Oriya
Sinh    Sinhala
Tale    Tai_Le
Talu    New_Tai_Lue
Taml    Tamil
Telu    Telugu
Tfng    Tifinagh
Thaa    Thaana
Thai    Thai 
Tibt    Tibetan
Yiii    Yi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070619/78068bc0/attachment-0001.html