utf8proc

diff README @ 2:aaad485d5335
Version 0.3

- changed normalization from NFC to NFKC for postgresql unifold function
- added support to mark the beginning of a grapheme cluster with 0xFF (option: CHARBOUND)
- added the ruby method String#chars, which is returning an array of UTF-8 encoded grapheme clusters
- added NLF2LF transformation in postgresql unifold function
- added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no normalization will be performed (different from previous versions)
- using integer constants rather than C-strings for character properties
- fixed (hopefully) a problem with the ruby library on Mac OS X, which occured when compiler optimization was switched on
author: jbe
date: Fri Aug 04 12:00:00 2006 +0200 (2006-08-04)
parents: 61a89ecc2fb9
children: a49e32490aac
     1.1 --- a/README	Tue Jun 20 12:00:00 2006 +0200
     1.2 +++ b/README	Fri Aug 04 12:00:00 2006 +0200
     1.3 @@ -24,13 +24,15 @@
     1.4  libraries and are therefore not dependent the dynamic version of the
     1.5  C library files, but this behaviour might change in future releases.
     1.6  
     1.7 -The Unicode version being supported is 4.1.0.
     1.8 +The Unicode version being supported is 5.0.0.
     1.9 +Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as version 5.0.0
    1.10 +      had not been available yet.
    1.11  
    1.12  For Unicode normalizations, the following options have to be used:
    1.13  Normalization Form C:  STABLE, COMPOSE
    1.14 -Normalization Form D:  STABLE
    1.15 +Normalization Form D:  STABLE, DECOMPOSE
    1.16  Normalization Form KC: STABLE, COMPOSE, COMPAT
    1.17 -Normalization Form KD: STABLE, COMPAT
    1.18 +Normalization Form KD: STABLE, DECOMPOSE, COMPAT
    1.19  
    1.20  
    1.21  *** C LIBRARY ***
    1.22 @@ -47,7 +49,7 @@
    1.23  
    1.24  The String#utf8map method does the same as the "utf8proc_map" C function.
    1.25  Options for the mapping procedure are passed as symbols, i.e:
    1.26 -"Hello".utf8map(:stable, :casefold) => "hello"
    1.27 +"Hello".utf8map(:casefold) => "hello"
    1.28  
    1.29  The descriptions of all options are found in the C header file "utf8proc.h".
    1.30  Please notice that the according symbols in ruby are all lowercase.
    1.31 @@ -62,7 +64,7 @@
    1.32  String#utf8nfkc, String#utf8nfkc!
    1.33  
    1.34  The method Integer#utf8 returns a UTF-8 string, which is containing the
    1.35 -unicode char given by the code point.
    1.36 +unicode char given by the code point. 
    1.37  0x000A.utf8 => "\n"
    1.38  0x2028.utf8 => "\342\200\250"
    1.39  
    1.40 @@ -81,6 +83,16 @@
    1.41  CREATE INDEX name_idx ON people (unifold(name));
    1.42  SELECT * FROM people WHERE unifold(name) = unifold('John Doe');
    1.43  
    1.44 +NOTICE: The outputs of the function can change between releases, as utf8proc
    1.45 +        does not follow a versioning stability policy. You have to rebuild
    1.46 +        your database indicies, if you upgrade to a newer version of utf8proc.
    1.47 +
    1.48 +
    1.49 +*** KNOWN BUGS ***
    1.50 +
    1.51 +- on Mac OS X there were segfaults reported when compiling the ruby library
    1.52 +  with optimization (-> don't use optimization if you have problems)
    1.53 +
    1.54  
    1.55  *** TODO ***
    1.56
author	jbe
date	Fri Aug 04 12:00:00 2006 +0200 (2006-08-04)
parents	61a89ecc2fb9
children	a49e32490aac