<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Chris,<div><br><div><div>On Mar 5, 2012, at 4:22 AM, Dillon, Chris wrote:</div><blockquote type="cite"><div><font color="#000000"><br></font>In the RFC3743-style tables at <a href="http://www.iana.org/domains/idn-tables/">http://www.iana.org/domains/idn-tables/</a> typically Simplified Chinese Preferred Variants and Traditional Chinese Preferred Variants have their own columns.<br><br><a href="http://tools.ietf.org/html/rfc5646">http://tools.ietf.org/html/rfc5646</a> gives the following example tags for Chinese; which should be standard for Chinese in this XML-based system?<br></div></blockquote><div><br></div><div>I would assume simply "zh" would be sufficient. It is not a requirement to stipulate the script in a language tag. Also, the entire tag is discretionary — if, for example, you created a fictitious table that had no bearing on any specific language or script, you would not be required to specify one.</div><div><br></div><blockquote type="cite"><div>A problem that many tables share is that one sees only Unicode numbers, no characters, and so when humans work with the tables, they often need to turn Unicode codes into characters or characters into Unicode codes. Is there any way that the XML could contain both (I think there are Unicode fonts containing nearly all the characters)?<br></div></blockquote><div><br></div><div>Creating a tool that takes the code points and turns them into something readable should be a trivial exercise, precisely because of the standardised format. I think it would be best to avoid superfluous descriptions of the individual codepoints in the spec itself, and would rather encourage tools that present the XML file in such a way as to be readable (as a web page, etc.)</div><div><br></div><div>For example, I can print human-readable representations from the XML table as follows very simply:</div><div><div><br></div><div>kim@gumleaf:idntables[master*]$ python</div><div>Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) </div><div>[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin</div><div>Type "help", "copyright", "credits" or "license" for more information.</div><div>>>> import idntables, unicodedata</div></div><div><div>>>> table = idntables.load("samples/nz_Latn_1.0.xml")</div><div>>>> for char in sorted(table._codepoints):</div><div>...     print "%s [U+%04X] %s" % (unichr(char), char, unicodedata.name(unichr(char)))</div><div>... </div><div><div>0 [U+0030] DIGIT ZERO</div><div>1 [U+0031] DIGIT ONE</div><div>2 [U+0032] DIGIT TWO</div><div>3 [U+0033] DIGIT THREE</div><div>4 [U+0034] DIGIT FOUR</div><div>5 [U+0035] DIGIT FIVE</div><div>6 [U+0036] DIGIT SIX</div><div>7 [U+0037] DIGIT SEVEN</div><div>8 [U+0038] DIGIT EIGHT</div><div>9 [U+0039] DIGIT NINE</div><div>a [U+0061] LATIN SMALL LETTER A</div><div>b [U+0062] LATIN SMALL LETTER B</div><div>c [U+0063] LATIN SMALL LETTER C</div><div>d [U+0064] LATIN SMALL LETTER D</div><div>e [U+0065] LATIN SMALL LETTER E</div><div>f [U+0066] LATIN SMALL LETTER F</div><div>g [U+0067] LATIN SMALL LETTER G</div><div>h [U+0068] LATIN SMALL LETTER H</div><div>i [U+0069] LATIN SMALL LETTER I</div><div>j [U+006A] LATIN SMALL LETTER J</div><div>k [U+006B] LATIN SMALL LETTER K</div><div>l [U+006C] LATIN SMALL LETTER L</div><div>m [U+006D] LATIN SMALL LETTER M</div><div>n [U+006E] LATIN SMALL LETTER N</div><div>o [U+006F] LATIN SMALL LETTER O</div><div>p [U+0070] LATIN SMALL LETTER P</div><div>q [U+0071] LATIN SMALL LETTER Q</div><div>r [U+0072] LATIN SMALL LETTER R</div><div>s [U+0073] LATIN SMALL LETTER S</div><div>t [U+0074] LATIN SMALL LETTER T</div><div>u [U+0075] LATIN SMALL LETTER U</div><div>v [U+0076] LATIN SMALL LETTER V</div><div>w [U+0077] LATIN SMALL LETTER W</div><div>x [U+0078] LATIN SMALL LETTER X</div><div>y [U+0079] LATIN SMALL LETTER Y</div><div>z [U+007A] LATIN SMALL LETTER Z</div><div>ā [U+0101] LATIN SMALL LETTER A WITH MACRON</div><div>ē [U+0113] LATIN SMALL LETTER E WITH MACRON</div><div>ī [U+012B] LATIN SMALL LETTER I WITH MACRON</div><div>ō [U+014D] LATIN SMALL LETTER O WITH MACRON</div><div>ū [U+016B] LATIN SMALL LETTER U WITH MACRON</div></div></div><div><br></div><div>kim</div></div></div></body></html>