Moving Right Along on the Inclusions Table...

Martin Duerst duerst at it.aoyama.ac.jp
Sun Dec 24 10:40:41 CET 2006


At 04:45 06/12/22, Kenneth Whistler wrote:
>Cary noted:

>> The essence of the argument about internationalizing the H in LDH is
>
>In my opinion, "internationalizing" any syntactic element of
>a formally-based syntax is simply a bad idea, period.

I pretty much agree. But please note that the hyphen is NOT
a syntactic element, in any uses we are looking at.
It's the only non-alpha, non-digit, non-syntactic
character in ASCII domain names.


>> whether the legacy hyphen should be taken to justify the inclusion of a
>> functionally similar mark in other scripts to which the hyphen is alien,
>> or whether the hyphen is simply something that may be used or not
>> depending on the extent to which it can be shoehorned into an
>> orthographic context in which it otherwise does not appear.
>
>The latter. I see no way to justify adding any number of other
>script hyphen analogs (and recall, this isn't limited to dashes,
>but is also going to end up in arguments about middle dots and such)
>merely because we have to grandfather in "-" from ASCII. If
>people end up using "-" in inappropriate contexts, they will do
>so, and we can't stop them, but I don't see the justification
>for adding to the confusion on the basis of some supposed
>fairness to other scripts issue.

I agree. I don't see the need for adding confusion. But
there may be cases that don't add any confusion, and in
these cases, we should at least contemplate including the
character in question, and probably add it.

As an example, assume there was a script where there was
an upside-down triangle that served a purpose similar to
the hyphen as we know it in Western scripts and in domain
names. If that upside-down triangle isn't confusable with
other characters in that script, in my view there is
nothing against including it.

>If you only allow Hebrew script (and combining marks), then
>putting a hyphen-minus or a maqaf shouldn't result in
>radically different layouts in a label -- the only difference
>would be whatever font difference there was in the design
>of the maqaf and the hyphen-minus.
>
>If you allow script mixing with Hebrew in a label, and
>put a hyphen-minus or a maqaf between directional runs, you
>could conceivably end up with layout differences that display
>the hyphen-minus or the maqaf at different ends of a run.

Mixing directionality in a label is not allowed currently.
And I don't think it is a good idea to allow it in the future.
While we know that we have to make some adjustments for
the current bidi solution for composing marks, the basic
idea of the current bidi solution is on very solid grounds.

>If anything, however, I think this would militate *against*
>including maqaf, because it could end up being one of those
>subtle differences that someone might take advantage of to
>bad ends, whereas it is hard to conceive of valid circumstances
>where the placement of a hyphen at one end or other of
>a visible run in a mixed-script bidirectional label is
>an important thing to be trying to preserve in internet
>identifiers.

The current provisions for bidi for both domain name labels
and for IRIs prohibits (well, actually in the IRI spec, this
is a SHOULD, to handle cases where e.g. data is transmitted
in a query component of an IRI) hyphens or other neutral
characters at either end of an RTL component.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp      mailto:duerst at it.aoyama.ac.jp    



More information about the Idna-update mailing list