Dot-mapping

Martin Duerst duerst at it.aoyama.ac.jp
Thu Dec 13 08:50:50 CET 2007


At 15:52 07/12/13, Yangwoo Ko wrote:
>
>Martin Duerst wrote:
>> At 04:01 07/12/12, Harald Tveit Alvestrand wrote:
>>>
>>> --On 11. desember 2007 16:56 +0900 fujiwara at jprs.co.jp wrote:
>>>
>>>> And more, the candidate dot-like characters are already listed
>>>> in Unicode 5.0 standard.  ( grep "FULL STOP" UnicodeData.txt )
>>>>
>>>> They all are marked as "NEVER" in draft-faltstrom-idnabis-tables-03.txt.
>>>> There is no collision/conflict.
>>>>
>>>> 002E; # FULL STOP
>>>> 0589; # ARMENIAN FULL STOP
>>>> 06D4; # ARABIC FULL STOP
>>>> 0701; # SYRIAC SUPRALINEAR FULL STOP
>>>> 0702; # SYRIAC SUBLINEAR FULL STOP
>>>> 1362; # ETHIOPIC FULL STOP
>>>> 166E; # CANADIAN SYLLABICS FULL STOP
>>>> 1803; # MONGOLIAN FULL STOP
>>>> 1809; # MONGOLIAN MANCHU FULL STOP
>>>> 2CF9; # COPTIC OLD NUBIAN FULL STOP
>>>> 2CFE; # COPTIC FULL STOP
>>>> 3002; # IDEOGRAPHIC FULL STOP
>>>> FE12; # PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP
>>>> FE52; # SMALL FULL STOP
>>>> FF0E; # FULLWIDTH FULL STOP
>>>> FF61; # HALFWIDTH IDEOGRAPHIC FULL STOP
>> I think this list should be carefully vetted, if it is used.
>> What really counts isn't that we would have all the characters
>> that contain "FULL STOP" in their name, but all the full-stop-like
>> characters that are available on keyboards where full stop itself
>> isn't available at the same time (i.e. without keyboard switching).
>
>Sorry. But, I don't understand the point.

Okay. Let's make some fictuous examples. Assume that we have
a script, say Slobbodian. Assume that there is a SLOBBODIAN
FULL STOP, which looks completely different from a dot. Assume
also that Slobbodian uses a plain U+002E full stop, not as e.g.
a sentence period, but let's say as a decimal separator in
numbers, and that therefore U+002E is available on every
Slobbodian keyboard. In such a case, there is absolutely
no need to add SLOBBODIAN FULL STOP to this list of special
characters, because Slobbodians will just easily type U+002E
on their keyboards.

Let's look at another example, a (again hypothetical) script
called Sluggerian. Assume that the only character looking close
to a U+002E full stop, and close in functionality, for whatever
bizarre historical reason, is part of Unicode with the name SLUGGERIAN
OCTOTHORPE. Even though this character does not contain the words
FULL STOP, it should probably be part of our 'dots' collection,
because otherwise, the Sluggerians are in big trouble when they
want to enter their domain names.

In summary, what counts is not character names, but shape,
(to some extent functionality,) and accessibility on keyboards.

Regards,    Martin.

>>> Of course there's also DIGIT ONE FULL STOP and friends.... but those are compatibility characters, so an user interface that does mapping will presumably remove them before they meet the NEVER barrier of IDNAbis.
>>>
>>> Of more confusing interest is things like 22C5 DOT OPERATOR or 30FB KATAKANA MIDDLE DOT,
>> Similar to a hyphen in Latin. It was excluded from identifiers
>> in XML 1.0, which produced quite a few complaints from Japan.
>> 
>>> or the already famous 00B7 MIDDLE DOT (which the Catalans say they need INSIDE the labels).
>> Regards,    Martin.
>> 
>> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
>> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     
>> _______________________________________________
>> Idna-update mailing list
>> Idna-update at alvestrand.no
>> http://www.alvestrand.no/mailman/listinfo/idna-update
>> 
>
>_______________________________________________
>Idna-update mailing list
>Idna-update at alvestrand.no
>http://www.alvestrand.no/mailman/listinfo/idna-update


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst at it.aoyama.ac.jp     



More information about the Idna-update mailing list