No subject


Tue Nov 18 23:43:20 CET 2008


point out, the joiners are necessary for proper spelling of some words and
names in a number of languages, and it is possible to write regular
expressions to reasonably approximate the contexts where that's the case. I=
n
contrast, the "sharp s" is really just a ligature of two differently-shaped
's' letters. In all but government usage (registration of names of persons)=
,
the difference between sharp s and ss can be ignored, and often is. There i=
s
no way to create a contextual rule for when to allow a sharp s. At best, on=
e
could use a dictionary, but that does not scale and would only upset users
who don't know or disagree with the standard spelling.

Best regards,
markus

On Wed, Dec 10, 2008 at 10:29 PM, Alireza Saleh <saleh at nic.ir> wrote:

> Dear Markus,
>
> We have the same problem for Arabic-Script character ZWNJ ( Zero With
> Non-Joiner ) . IDNA2003 says that this character should be removed before
> generating the A-label.  But in IDNA2008, this character categorized as
> CONTEXT because its unicode's property is 'joining'. As it is a necessary
> character for some languages using Arabic-script, it is possible for us t=
o
> define a safe contextual rule for it and use it in IDN lables. Without
> having a contextual rule, IDNA2008 treats the same as IDNA2003 in case of
> ZWNJ. I don't know if it is possible to propose the same solution for
> 'eszett' and the other mapping characters.

------=_Part_96567_24591238.1228979599349
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Dear Alireza,<div><br></div><div>Thanks for your reply.</div><div><br></div=
><div>As far as I understand, ZWJ/ZWNJ and =DF are similar in that they are=
 treated differently in IDNA 2003 vs. 2008, and present similar problems (n=
ew distinctions of domain names or failure to connect).</div>
<div><br></div><div>From the user&#39;s perspective, these cases are a bit =
different though: As you point out, the joiners are necessary for proper sp=
elling of some words and names in a number of languages, and it is possible=
 to write regular expressions to reasonably approximate the contexts where =
that&#39;s the case. In contrast, the &quot;sharp s&quot; is really just a =
ligature of two differently-shaped &#39;s&#39; letters. In all but governme=
nt usage (registration of names of persons), the difference between sharp s=
 and ss can be ignored, and often is. There is no way to create a contextua=
l rule for when to allow a sharp s. At best, one could use a dictionary, bu=
t that does not scale and would only upset users who don&#39;t know or disa=
gree with the standard spelling.</div>
<div><br></div><div>Best regards,</div><div>markus<br><br><div class=3D"gma=
il_quote">On Wed, Dec 10, 2008 at 10:29 PM, Alireza Saleh <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:saleh at nic.ir">saleh at nic.ir</a>&gt;</span> wrote:<br>=
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">
Dear Markus,<br>
<br>
We have the same problem for Arabic-Script character ZWNJ ( Zero With Non-J=
oiner ) . IDNA2003 says that this character should be removed before genera=
ting the A-label. &nbsp;But in IDNA2008, this character categorized as CONT=
EXT because its unicode&#39;s property is &#39;joining&#39;. As it is a nec=
essary character for some languages using Arabic-script, it is possible for=
 us to define a safe contextual rule for it and use it in IDN lables. Witho=
ut having a contextual rule, IDNA2008 treats the same as IDNA2003 in case o=
f ZWNJ. I don&#39;t know if it is possible to propose the same solution for=
 &#39;eszett&#39; and the other mapping characters.</blockquote>
<div><br></div></div>
</div>

------=_Part_96567_24591238.1228979599349--


More information about the Idna-update mailing list