utf8proc
diff README @ 2:aaad485d5335
Version 0.3
- changed normalization from NFC to NFKC for postgresql unifold function
- added support to mark the beginning of a grapheme cluster with 0xFF (option: CHARBOUND)
- added the ruby method String#chars, which is returning an array of UTF-8 encoded grapheme clusters
- added NLF2LF transformation in postgresql unifold function
- added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no normalization will be performed (different from previous versions)
- using integer constants rather than C-strings for character properties
- fixed (hopefully) a problem with the ruby library on Mac OS X, which occured when compiler optimization was switched on
- changed normalization from NFC to NFKC for postgresql unifold function
- added support to mark the beginning of a grapheme cluster with 0xFF (option: CHARBOUND)
- added the ruby method String#chars, which is returning an array of UTF-8 encoded grapheme clusters
- added NLF2LF transformation in postgresql unifold function
- added the DECOMPOSE option, if you neither use COMPOSE or DECOMPOSE, no normalization will be performed (different from previous versions)
- using integer constants rather than C-strings for character properties
- fixed (hopefully) a problem with the ruby library on Mac OS X, which occured when compiler optimization was switched on
author | jbe |
---|---|
date | Fri Aug 04 12:00:00 2006 +0200 (2006-08-04) |
parents | 61a89ecc2fb9 |
children | a49e32490aac |
line diff
1.1 --- a/README Tue Jun 20 12:00:00 2006 +0200 1.2 +++ b/README Fri Aug 04 12:00:00 2006 +0200 1.3 @@ -24,13 +24,15 @@ 1.4 libraries and are therefore not dependent the dynamic version of the 1.5 C library files, but this behaviour might change in future releases. 1.6 1.7 -The Unicode version being supported is 4.1.0. 1.8 +The Unicode version being supported is 5.0.0. 1.9 +Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as version 5.0.0 1.10 + had not been available yet. 1.11 1.12 For Unicode normalizations, the following options have to be used: 1.13 Normalization Form C: STABLE, COMPOSE 1.14 -Normalization Form D: STABLE 1.15 +Normalization Form D: STABLE, DECOMPOSE 1.16 Normalization Form KC: STABLE, COMPOSE, COMPAT 1.17 -Normalization Form KD: STABLE, COMPAT 1.18 +Normalization Form KD: STABLE, DECOMPOSE, COMPAT 1.19 1.20 1.21 *** C LIBRARY *** 1.22 @@ -47,7 +49,7 @@ 1.23 1.24 The String#utf8map method does the same as the "utf8proc_map" C function. 1.25 Options for the mapping procedure are passed as symbols, i.e: 1.26 -"Hello".utf8map(:stable, :casefold) => "hello" 1.27 +"Hello".utf8map(:casefold) => "hello" 1.28 1.29 The descriptions of all options are found in the C header file "utf8proc.h". 1.30 Please notice that the according symbols in ruby are all lowercase. 1.31 @@ -62,7 +64,7 @@ 1.32 String#utf8nfkc, String#utf8nfkc! 1.33 1.34 The method Integer#utf8 returns a UTF-8 string, which is containing the 1.35 -unicode char given by the code point. 1.36 +unicode char given by the code point. 1.37 0x000A.utf8 => "\n" 1.38 0x2028.utf8 => "\342\200\250" 1.39 1.40 @@ -81,6 +83,16 @@ 1.41 CREATE INDEX name_idx ON people (unifold(name)); 1.42 SELECT * FROM people WHERE unifold(name) = unifold('John Doe'); 1.43 1.44 +NOTICE: The outputs of the function can change between releases, as utf8proc 1.45 + does not follow a versioning stability policy. You have to rebuild 1.46 + your database indicies, if you upgrade to a newer version of utf8proc. 1.47 + 1.48 + 1.49 +*** KNOWN BUGS *** 1.50 + 1.51 +- on Mac OS X there were segfaults reported when compiling the ruby library 1.52 + with optimization (-> don't use optimization if you have problems) 1.53 + 1.54 1.55 *** TODO *** 1.56