Flattr this!
 

utf8proc

utf8proc is a library for processing UTF-8 encoded Unicode strings. Some features are Unicode normalization, stripping of default ignorable characters, case folding and detection of grapheme cluster boundaries. A special character mapping is available, which converts for example the characters “Hyphen” (U+2010), “Minus” (U+2212) and “Hyphen-Minus” (U+002D, ASCII Minus) all into the ASCII minus sign, to make them equal for comparisons.

The library can be used in C programs, but most of the functionality is also available as a ruby library. For PostgreSQL there is an extension, providing a function for preparing strings in case insensitive indicies or to compare two strings for equality.

The currently supported Unicode version is 5.0.0.

Download

Package for RubyGems

Documentation

Open issues

There currently exists a development fork of utf8proc on github, which is called libmojibake. See their project description for more information.

Changes