Comments on protocol-04

Kenneth Whistler kenw at sybase.com
Tue Mar 4 02:12:18 CET 2008


Mark, in comments on protocol-04 suggested:

>  >  5.3.  Character Changes in Preprocessing or the User Interface
>  >
>  >    The Unicode string MAY then be processed, in a way specific to the
>  >    local environment, to make the result of the IDNA processing match
>  >    user expectations.  For instance, at this step, it would be
>  >    reasonable to convert all upper case characters to lower case, if
>  >    this makes sense in the user's environment.
>  >
>  >    Other examples of processing for localization that might be applied,
>  >    if appropriate, at this point (but even further outside the scope of
>  >    this specification) include interpreting the KANA MIDDLE DOT as
> 
>  Bad example. Since the Middle dot is allowed currently, it cannot be
>  treated as a separator.
> 
> 
>  >    separating domain name components from each other, mapping different
>  >    "width" forms of the same character into the one form permitted in
>  >    labels, or giving special treatment to characters whose presentation
>  >    forms are dependent only on placement in the label.

And Erik countered with U+06D4 ARABIC FULL STOP... and then the
thread started morphing into another discussion of dots.

IMO, U+06D4 is *not* a good example to substitute here,
as the ordinary full stop used in Arabic (and Persian,
and in most languages in the Arabic script) is simply U+002E FULL STOP.
U+06D4 is a special form used primarily in Nastaliq style for
Urdu.

How about just sticking to the most important character
used this way, with already mandated localized behavior
as a separator of domain name components, from RFC 3490:

U+3002 IDEOGRAPHIC FULL STOP

That one has the advantage that it already *is* being
treated "as a dot" (meaning as a dot label separator),
but it isn't visually confusable with either a baseline
dot (U+002E FULL STOP) or a midline dot (U+00B7 MIDDLE DOT).

--Ken





More information about the Idna-update mailing list