utf8proc archive (until v1.1.6)

utf8proc is now maintained by the Julia project.

Visit utf8proc on julialang.org for newer versions.

Old tarballs

Old packages for RubyGems

Open issues (as of v1.1.6)

  • Wrong treatment of COMBINING GREEK YPOGEGRAMMENI and any characters that have it as part of their decomposition (requires normalization before and after casefolding, refer to chapter 3 of the Unicode standard)
  • Currently, only Unicode v5.0.0 is supported
  • Code-cleanup needed
    • UTF-8 decoding and encoding should be done in an independent step
    • Import script for Unicode data requires code-cleanup

Changes until v1.1.6

  • 2013-11-27: Version 1.1.6 released
    • PostgreSQL 9.2 and 9.3 compatibility (lower case 'c' language name)
  • 2009-10-16: Version 1.1.5 released
    • Use RSTRING_PTR() and RSTRING_LEN() instead of RSTRING()->ptr and RSTRING()->len for ruby1.9 compatibility (and #define them, if not existent)
    • Patches for compatibility with Microsoft Visual Studio
    • Fixes to make utf8proc usable in C++ programs
  • 2009-08-19: Version 1.1.4 released
    • Replaced C++ style comments for compatibility reasons
    • Added typecasts to suppress compiler warnings
    • Removed redundant source files for ruby-gemfile generation
    • Changed copyright notice for Public Software Group e. V.
    • Minor changes in the README file
  • Changes in version 1.1.3:
    • PostgreSQL 8.3 compatibility (use of SET_VARSIZE macro)
    • Added a function utf8proc_version returning a string containing the version number of the library.
    • Included a target libutf8proc.dylib for MacOSX.
  • Changes in version 1.1.2
    • Fixed a serious bug in the data file generator, which caused characters being treated incorrectly, when stripping default ignorable characters or calculating grapheme cluster boundaries.
  • Changes in version 1.1.1
    • Changed license from BSD to MIT style.
    • Added a new function utf8proc_codepoint_valid to the C library.
    • Changed compiler flags in Makefile from -g -O0 to -O2
    • The ruby script, which was used to build the utf8proc_data.c file, is now included in the distribution.
    • Added a new PostgreSQL function unistrip, which behaves like unifold, but also removes all character marks (e.g. accents).
  • Changes in version 1.0.3
    • Fixed a bug in the ruby library, which caused an error, when splitting an empty string at grapheme cluster boundaries (method String#utf8chars).
  • Changes in version 1.0.2
    • added support for PostgreSQL version 8.2
    • included a check in Integer#utf8, which raises an exception, if the given code-point is invalid because of being too high (this was missing yet)
  • Changes in version 1.0.1
    • included a gem file for the ruby version of the library
  • Changes in version 1.0
    • added the LUMP option, which lumps certain characters together (see lump.txt) (also used for the PostgreSQL unifold function)
    • added the STRIPMARK option, which strips marking characters (or marks of composed characters)
    • deprecated ruby method String#char_ary in favour of String#utf8chars
  • Changes in version 0.3
    added support to mark the beginning of a grapheme cluster with 0xFF (option: CHARBOUND)
    • added the ruby method String#chars, which is returning an array of UTF-8 encoded grapheme clusters
    • added NLF2LF transformation in postgresql unifold function
    • added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no normalization will be performed (different from previous versions)
    • using integer constants rather than C-strings for character properties
    • fixed (hopefully) a problem with the ruby library on Mac OS X, which occured when compiler optimization was switched on
    • changed normalization from NFC to NFKC for postgresql unifold function
  • Changes in version 0.2
    • added -fpic compiler flag in Makefile
    • fixed bug in the C code for the ruby library (usage of non-existent function)
    • changed behaviour of PostgreSQL function to return NULL in case of invalid input, rather than raising an exceptional condition
    • improved efficiency of PostgreSQL function (no transformation to C string is done)
  • 2006-06-02: First release v0.1