A proposed solution for descriptions (was: Re: ISO 639 - New item approved - N'Ko)

Kent Karlsson kentk at cs.chalmers.se
Sun Jun 11 18:56:11 CEST 2006

Vidar Larsen wrote:
> During indexing, both unicode normalization forms


> and more ad hoc  normalizing may be applied to resolve issues
> of accents on individual  characters.

That is very questionable.

> Case-normalizing is also done.

That is also questionable, and often NOT what at least I usually want.

> Non-word characters, such as apostrophes, quotes, dashes etc. are in
> general just used to separate words.

That too is questionable, though certain fallbacks may be seen as

> only matching mechanisms for the users convenience.

Or inconvenience, too often leading to far too many false positives.


> Now, onto your concrete suggestions. I support expanding parentheses
> descriptions containing true variants, while leaving
> parenthesis that
> add qualifiers. I also naturally support fixing the problem/error/
> typo with 'Amis.
> I support replacing semantically wrong characters with the more
> correct alternatives.
> I do not support adding "dumbed down" descriptions in an effort to
> normalized the original description, for reasons given above.

I basically agree with that, though I have some additional comments.
See separate e-mail.

                /kent k

More information about the Ietf-languages mailing list