A proposed solution for descriptions (was: Re: ISO 639 - New
item approved - N'Ko)
kentk at cs.chalmers.se
Sun Jun 11 18:56:11 CEST 2006
Vidar Larsen wrote:
> During indexing, both unicode normalization forms
> and more ad hoc normalizing may be applied to resolve issues
> of accents on individual characters.
That is very questionable.
> Case-normalizing is also done.
That is also questionable, and often NOT what at least I usually want.
> Non-word characters, such as apostrophes, quotes, dashes etc. are in
> general just used to separate words.
That too is questionable, though certain fallbacks may be seen as
> only matching mechanisms for the users convenience.
Or inconvenience, too often leading to far too many false positives.
> Now, onto your concrete suggestions. I support expanding parentheses
> descriptions containing true variants, while leaving
> parenthesis that
> add qualifiers. I also naturally support fixing the problem/error/
> typo with 'Amis.
> I support replacing semantically wrong characters with the more
> correct alternatives.
> I do not support adding "dumbed down" descriptions in an effort to
> normalized the original description, for reasons given above.
I basically agree with that, though I have some additional comments.
See separate e-mail.
More information about the Ietf-languages