No subject
Tue Nov 18 23:43:20 CET 2008
point out, the joiners are necessary for proper spelling of some words and
names in a number of languages, and it is possible to write regular
expressions to reasonably approximate the contexts where that's the case. I=
n
contrast, the "sharp s" is really just a ligature of two differently-shaped
's' letters. In all but government usage (registration of names of persons)=
,
the difference between sharp s and ss can be ignored, and often is. There i=
s
no way to create a contextual rule for when to allow a sharp s. At best, on=
e
could use a dictionary, but that does not scale and would only upset users
who don't know or disagree with the standard spelling.
Best regards,
markus
On Wed, Dec 10, 2008 at 10:29 PM, Alireza Saleh <saleh at nic.ir> wrote:
> Dear Markus,
>
> We have the same problem for Arabic-Script character ZWNJ ( Zero With
> Non-Joiner ) . IDNA2003 says that this character should be removed before
> generating the A-label. But in IDNA2008, this character categorized as
> CONTEXT because its unicode's property is 'joining'. As it is a necessary
> character for some languages using Arabic-script, it is possible for us t=
o
> define a safe contextual rule for it and use it in IDN lables. Without
> having a contextual rule, IDNA2008 treats the same as IDNA2003 in case of
> ZWNJ. I don't know if it is possible to propose the same solution for
> 'eszett' and the other mapping characters.
------=_Part_96567_24591238.1228979599349
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Dear Alireza,<div><br></div><div>Thanks for your reply.</div><div><br></div=
><div>As far as I understand, ZWJ/ZWNJ and =DF are similar in that they are=
treated differently in IDNA 2003 vs. 2008, and present similar problems (n=
ew distinctions of domain names or failure to connect).</div>
<div><br></div><div>From the user's perspective, these cases are a bit =
different though: As you point out, the joiners are necessary for proper sp=
elling of some words and names in a number of languages, and it is possible=
to write regular expressions to reasonably approximate the contexts where =
that's the case. In contrast, the "sharp s" is really just a =
ligature of two differently-shaped 's' letters. In all but governme=
nt usage (registration of names of persons), the difference between sharp s=
and ss can be ignored, and often is. There is no way to create a contextua=
l rule for when to allow a sharp s. At best, one could use a dictionary, bu=
t that does not scale and would only upset users who don't know or disa=
gree with the standard spelling.</div>
<div><br></div><div>Best regards,</div><div>markus<br><br><div class=3D"gma=
il_quote">On Wed, Dec 10, 2008 at 10:29 PM, Alireza Saleh <span dir=3D"ltr"=
><<a href=3D"mailto:saleh at nic.ir">saleh at nic.ir</a>></span> wrote:<br>=
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">
Dear Markus,<br>
<br>
We have the same problem for Arabic-Script character ZWNJ ( Zero With Non-J=
oiner ) . IDNA2003 says that this character should be removed before genera=
ting the A-label. But in IDNA2008, this character categorized as CONT=
EXT because its unicode's property is 'joining'. As it is a nec=
essary character for some languages using Arabic-script, it is possible for=
us to define a safe contextual rule for it and use it in IDN lables. Witho=
ut having a contextual rule, IDNA2008 treats the same as IDNA2003 in case o=
f ZWNJ. I don't know if it is possible to propose the same solution for=
'eszett' and the other mapping characters.</blockquote>
<div><br></div></div>
</div>
------=_Part_96567_24591238.1228979599349--
More information about the Idna-update
mailing list