Public Software Group e. V. · utf8proc archive until v1.1.6

utf8proc archive (until v1.1.6)

utf8proc is now maintained by the Julia project.

Visit utf8proc on julialang.org for newer versions.

Old tarballs

Old packages for RubyGems

Open issues (as of v1.1.6)

Wrong treatment of COMBINING GREEK YPOGEGRAMMENI and any characters that have it as part of their decomposition (requires normalization before and after casefolding, refer to chapter 3 of the Unicode standard)
Currently, only Unicode v5.0.0 is supported
Code-cleanup needed
- UTF-8 decoding and encoding should be done in an independent step
- Import script for Unicode data requires code-cleanup

Changes until v1.1.6

2013-11-27: Version 1.1.6 released
- PostgreSQL 9.2 and 9.3 compatibility (lower case 'c' language name)
2009-10-16: Version 1.1.5 released
- Use RSTRING_PTR() and RSTRING_LEN() instead of RSTRING()->ptr and RSTRING()->len for ruby1.9 compatibility (and #define them, if not existent)
- Patches for compatibility with Microsoft Visual Studio
- Fixes to make utf8proc usable in C++ programs
2009-08-19: Version 1.1.4 released
- Replaced C++ style comments for compatibility reasons
- Added typecasts to suppress compiler warnings
- Removed redundant source files for ruby-gemfile generation
- Changed copyright notice for Public Software Group e. V.
- Minor changes in the README file
Changes in version 1.1.3:
- PostgreSQL 8.3 compatibility (use of SET_VARSIZE macro)
- Added a function utf8proc_version returning a string containing the version number of the library.
- Included a target libutf8proc.dylib for MacOSX.
Changes in version 1.1.2
- Fixed a serious bug in the data file generator, which caused characters being treated incorrectly, when stripping default ignorable characters or calculating grapheme cluster boundaries.
Changes in version 1.1.1
- Changed license from BSD to MIT style.
- Added a new function utf8proc_codepoint_valid to the C library.
- Changed compiler flags in Makefile from -g -O0 to -O2
- The ruby script, which was used to build the utf8proc_data.c file, is now included in the distribution.
- Added a new PostgreSQL function unistrip, which behaves like unifold, but also removes all character marks (e.g. accents).
Changes in version 1.0.3
- Fixed a bug in the ruby library, which caused an error, when splitting an empty string at grapheme cluster boundaries (method String#utf8chars).
Changes in version 1.0.2
- added support for PostgreSQL version 8.2
- included a check in Integer#utf8, which raises an exception, if the given code-point is invalid because of being too high (this was missing yet)
Changes in version 1.0.1
- included a gem file for the ruby version of the library
Changes in version 1.0
- added the LUMP option, which lumps certain characters together (see lump.txt) (also used for the PostgreSQL unifold function)
- added the STRIPMARK option, which strips marking characters (or marks of composed characters)
- deprecated ruby method String#char_ary in favour of String#utf8chars
Changes in version 0.3
added support to mark the beginning of a grapheme cluster with 0xFF (option: CHARBOUND)
- added the ruby method String#chars, which is returning an array of UTF-8 encoded grapheme clusters
- added NLF2LF transformation in postgresql unifold function
- added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no normalization will be performed (different from previous versions)
- using integer constants rather than C-strings for character properties
- fixed (hopefully) a problem with the ruby library on Mac OS X, which occured when compiler optimization was switched on
- changed normalization from NFC to NFKC for postgresql unifold function
Changes in version 0.2
- added -fpic compiler flag in Makefile
- fixed bug in the C code for the ruby library (usage of non-existent function)
- changed behaviour of PostgreSQL function to return NULL in case of invalid input, rather than raising an exceptional condition
- improved efficiency of PostgreSQL function (no transformation to C string is done)
2006-06-02: First release v0.1