Comments on protocol-04
kenw at sybase.com
Tue Mar 4 02:12:18 CET 2008
Mark, in comments on protocol-04 suggested:
> > 5.3. Character Changes in Preprocessing or the User Interface
> > The Unicode string MAY then be processed, in a way specific to the
> > local environment, to make the result of the IDNA processing match
> > user expectations. For instance, at this step, it would be
> > reasonable to convert all upper case characters to lower case, if
> > this makes sense in the user's environment.
> > Other examples of processing for localization that might be applied,
> > if appropriate, at this point (but even further outside the scope of
> > this specification) include interpreting the KANA MIDDLE DOT as
> Bad example. Since the Middle dot is allowed currently, it cannot be
> treated as a separator.
> > separating domain name components from each other, mapping different
> > "width" forms of the same character into the one form permitted in
> > labels, or giving special treatment to characters whose presentation
> > forms are dependent only on placement in the label.
And Erik countered with U+06D4 ARABIC FULL STOP... and then the
thread started morphing into another discussion of dots.
IMO, U+06D4 is *not* a good example to substitute here,
as the ordinary full stop used in Arabic (and Persian,
and in most languages in the Arabic script) is simply U+002E FULL STOP.
U+06D4 is a special form used primarily in Nastaliq style for
How about just sticking to the most important character
used this way, with already mandated localized behavior
as a separator of domain name components, from RFC 3490:
U+3002 IDEOGRAPHIC FULL STOP
That one has the advantage that it already *is* being
treated "as a dot" (meaning as a dot label separator),
but it isn't visually confusable with either a baseline
dot (U+002E FULL STOP) or a midline dot (U+00B7 MIDDLE DOT).
More information about the Idna-update