Unicode & IETF
paf at frobbit.se
Mon Aug 11 22:06:07 CEST 2014
On 11 aug 2014, at 21:52, Shawn Steele <Shawn.Steele at microsoft.com> wrote:
>> That the two standard organizations do reach different results when applying whatever algorithms they use when calculating what the best solution is to reach whatever goal is to be reached is for me completely understood.
> As an implementer that's really problematic. I'd like Unicode to behave consistently. If there's an environment/context that Unicode 'doesn't work for', then I'd like a well-designed set of rules that considers both sides and figures out how to make those rules (which is what I think the IETF is attempting). Different sets of rules for every application or contexts quickly become unwieldy.
> It doesn't help users if one application has characters that are typed a certain way for certain languages or whatever, and then another application says "I don't understand the word you typed, spell it differently". I can't spell it differently, my keyboard only let me type it one way.
Completely understood. Given I am also an implementer of "these kind of things" I completely agree with you of an interesting goal.
In IETF context we joke and say "I also want a pony" in these kind of situations.
Already in the world of IDNA2003 with stringprep it was recognized that different application need different stringprep profiles.
That is also the situation for IDNA2008.
IDNA2008 is for domain names, and possibly only host names (but lets skip that discussion). It leaves (compared with IDNA2003) case folding and various transformations out of the standard itself just because various keyboards and input mechanisms should be given the ability to transform input to whatever is to be used as a domain name in the best possible way for the context the transformation is happening in. That can not be dictated, but instead should be left for innovation.
Because of this, IETF has for example the PRECIS working group that is working on other applications than domain names. Where different rules will be used. Sure, a majority is most certainly the same. And most certainly based on basic unicode constructs.
But there will be a difference. As it seems.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 195 bytes
Desc: Message signed with OpenPGP using GPGMail
More information about the Idna-update