New version, draft-faltstrom-idnabis-tables-02.txt, available
Debbie Garside
debbie at ictmarketing.co.uk
Tue Jun 19 22:39:54 CEST 2007
Sorry to be a pain with my lack of knowledge, but two questions:
1. When you refer to script are you talking about Unicode Script or ISO
15924 Script or are they one and the same? Define the use of script here
please.
2. I can see the reason for not having an ALWAYS become a NEVER but surely
there is no reason for the vice versa if future applications can deal with
what was a NEVER it could become an ALWAYS. Or?
Best regards
Debbie Garside
_____
From: idna-update-bounces at alvestrand.no
[mailto:idna-update-bounces at alvestrand.no] On Behalf Of Mark Davis
Sent: 19 June 2007 20:36
To: Harald Tveit Alvestrand
Cc: Patrik Fältström; idna-update at alvestrand.no
Subject: Re: New version, draft-faltstrom-idnabis-tables-02.txt, available
On 6/13/07, Harald Tveit Alvestrand <harald at alvestrand.no> wrote:
The intent of "MAYBE YES" and "MAYBE NO" was:
- ALWAYS: We guarantee that these codepoints will be permitted in IDNs (at
this level of the standard).
- NEVER: We guarantee that these codepoints will never be permitted in IDNs
...
This is precisely the kind of information that belongs in Patrik's draft.
Without a model of the intended usage, it is impossible to assess the
structure of the document. At least with this kind of information we can
begin to sensibly discuss the pros and cons of the changes over what we had
last December.
However, it needs even more background information and justification.
Without knowing what you and Patrik meant by "Stable", it is also impossible
to assess how scripts should be assigned to that category. After all, Thai
is just as stable as Latin, if not more so, depending on what is meant, yet
you exclude Thai but keep Latin. I was originally guessing that what you
mean by "stable" is "has no characters that are problematic for IDN", but in
that case, Latin itself is not stable because of the potential confusability
between "1" and "l". Or, if confusability is not the issue, then you really
need to have some examples of what exactly are the problems you are trying
to prevent.
That is, you need to provide more information as to what you intend by
"stable", with specific examples of scripts that you consider stable and
why, and scripts that you consider unstable and why. Failing that, I don't
see why every non-archaic script would not be stable, thus obviating the
need for your new 4 distinctions instead of the 3 that we had up until a few
days ago.
Now, I found a hint of what may be at issue in Patrik's further response:
> Secondly, the ALWAYS and NEVER property values are only allowed on
unproblematic scripts if we have a rough consensus that the
codepoints will not move from ALWAYS to NEVER or vice versa given the
algorithm we have to calculate the property value itself.
If *that* notion of stability is all that is being talked about, then it is
very easy, and we have done it with a number of Unicode properties. Define
the following:
Grandfathered_Always to be all characters that were Always under any
previous Unicode version back to some base level (say 5.0)
Grandfathered_Never to be all characters that were Never under any previous
Unicode version back to some base level (say 5.0)
Then modify the end of my message of June 13 to be:
Then derive the following sets:
* Always = Grandfathered | (Favored & Functional) |
Grandfathered_Always
* Maybe_Yes = !Favored & Functional & !(Always | Grandfathered_Never)
* Maybe_Not = (Archaic | (!Favored & !Functional)) & !(Always |
Grandfathered_Never)
* Never = everything else
Harald said a bit later:
> So far, I've seen a lot of hand-wringing about the list of scripts being
too short, the list of scripts being Europe-centric, the arguments for
the list of scripts being too weak, the list of scripts including
worrisome characters (IPA), but I have NOT seen ANY flat statement
"script XXX is unproblematic and should be included".
My response would be:
Each script other than the archaic ones is no more problematic overall than
the Latin, Greek, and Cyrillic you have already included. Thus if you
include Latin, Greek, and Cyrillic, you should include them:
Arab Arabic
Armn Armenian
Bali Balinese
Beng Bengali
Bopo Bopomofo
Buhd Buhid
Cans Canadian_Aboriginal
Cher Cherokee
Cyrl Cyrillic
Deva Devanagari
Ethi Ethiopic
Geor Georgian
Grek Greek
Gujr Gujarati
Guru Gurmukhi
Hang Hangul
Hani Han
Hebr Hebrew
Hira Hiragana
Kana Katakana
Khmr Khmer
Knda Kannada
Laoo Lao
Latn Latin
Limb Limbu
Mlym Malayalam
Mong Mongolian
Mymr Myanmar
Nkoo Nko
Orya Oriya
Sinh Sinhala
Tale Tai_Le
Talu New_Tai_Lue
Taml Tamil
Telu Telugu
Tfng Tifinagh
Thaa Thaana
Thai Thai
Tibt Tibetan
Yiii Yi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.alvestrand.no/pipermail/idna-update/attachments/20070619/78068bc0/attachment-0001.html
More information about the Idna-update
mailing list