utf8proc
diff README @ 7:fcfd8c836c64
Version 1.1.1
- Added a new PostgreSQL function 'unistrip', which behaves like 'unifold', but also removes all character marks (e.g. accents).
- Changed license from BSD to MIT style.
- Added a new function 'utf8proc_codepoint_valid' to the C library.
- Changed compiler flags in Makefile from -g -O0 to -O2
- The ruby script, which was used to build the utf8proc_data.c file, is now included in the distribution.
- Added a new PostgreSQL function 'unistrip', which behaves like 'unifold', but also removes all character marks (e.g. accents).
- Changed license from BSD to MIT style.
- Added a new function 'utf8proc_codepoint_valid' to the C library.
- Changed compiler flags in Makefile from -g -O0 to -O2
- The ruby script, which was used to build the utf8proc_data.c file, is now included in the distribution.
author | jbe |
---|---|
date | Sun Jul 22 12:00:00 2007 +0200 (2007-07-22) |
parents | a49e32490aac |
children | 951e73a98021 |
line diff
1.1 --- a/README Fri Mar 16 12:00:00 2007 +0100 1.2 +++ b/README Sun Jul 22 12:00:00 2007 +0200 1.3 @@ -11,14 +11,14 @@ 1.4 "make all" can be used to build everything, but both ruby and PostgreSQL 1.5 installations are required in this case. 1.6 1.7 -For ruby there is alternatively provided a gem-file "utf8proc-1.0.1.gem". 1.8 +For ruby there is alternatively provided a gem-file "utf8proc-1.1.1.gem". 1.9 1.10 1.11 *** GENERAL INFORMATION *** 1.12 1.13 -The C library is found in this directory after successful compilation and is 1.14 -named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of the 1.15 -files "utf8proc.rb" and "utf8proc_native.so", which are found in the 1.16 +The C library is found in this directory after successful compilation and 1.17 +is named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of 1.18 +the files "utf8proc.rb" and "utf8proc_native.so", which are found in the 1.19 subdirectory "ruby/". The PostgreSQL extension is named "utf8proc_pgsql.so" 1.20 and resides in the "pgsql/" directory. 1.21 1.22 @@ -27,8 +27,8 @@ 1.23 C library files, but this behaviour might change in future releases. 1.24 1.25 The Unicode version being supported is 5.0.0. 1.26 -Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as version 5.0.0 1.27 - had not been available yet. 1.28 +Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as 1.29 + version 5.0.0 had not been available yet. 1.30 1.31 For Unicode normalizations, the following options have to be used: 1.32 Normalization Form C: STABLE, COMPOSE 1.33 @@ -53,8 +53,9 @@ 1.34 Options for the mapping procedure are passed as symbols, i.e: 1.35 "Hello".utf8map(:casefold) => "hello" 1.36 1.37 -The descriptions of all options are found in the C header file "utf8proc.h". 1.38 -Please notice that the according symbols in ruby are all lowercase. 1.39 +The descriptions of all options are found in the C header file 1.40 +"utf8proc.h". Please notice that the according symbols in ruby are all 1.41 +lowercase. 1.42 1.43 String#utf8map! is the destructive function in the meaning that the string 1.44 is replaced by the result. 1.45 @@ -66,16 +67,18 @@ 1.46 String#utf8nfkc, String#utf8nfkc! 1.47 1.48 The method Integer#utf8 returns a UTF-8 string, which is containing the 1.49 -unicode char given by the code point. 1.50 +unicode char given by the code point. 1.51 0x000A.utf8 => "\n" 1.52 0x2028.utf8 => "\342\200\250" 1.53 1.54 1.55 *** POSTGRESQL API *** 1.56 1.57 -For PostgreSQL there is a SQL function supplied named "unifold". This 1.58 -function can be used to prepare index fields in order to be normalized and 1.59 -case-folded, i.e.: 1.60 +For PostgreSQL there are two SQL functions supplied named "unifold" and 1.61 +"unistrip". These functions function can be used to prepare index fields in 1.62 +order to be folded in a way where string-comparisons make more sense, e.g. 1.63 +where "bathtub" == "bath<soft hyphen>tub" 1.64 +or "Hello World" == "hello world". 1.65 1.66 CREATE TABLE people ( 1.67 id serial8 primary key, 1.68 @@ -85,9 +88,13 @@ 1.69 CREATE INDEX name_idx ON people (unifold(name)); 1.70 SELECT * FROM people WHERE unifold(name) = unifold('John Doe'); 1.71 1.72 -NOTICE: The outputs of the function can change between releases, as utf8proc 1.73 - does not follow a versioning stability policy. You have to rebuild 1.74 - your database indicies, if you upgrade to a newer version of utf8proc. 1.75 +The function "unistrip" removes character marks like accents or diaeresis, 1.76 +while "unifold" keeps then. 1.77 + 1.78 +NOTICE: The outputs of the function can change between releases, as 1.79 + utf8proc does not follow a versioning stability policy. You have to 1.80 + rebuild your database indicies, if you upgrade to a newer version 1.81 + of utf8proc. 1.82 1.83 1.84 *** TODO *** 1.85 @@ -98,7 +105,11 @@ 1.86 - support stream processing 1.87 1.88 1.89 -Unicode is a trademark of Unicode, Inc., and may be registered in some 1.90 -jurisdictions. 1.91 +*** CONTACT *** 1.92 1.93 +If you find any bugs or experience difficulties in compiling this software, 1.94 +please contact me: 1.95 1.96 +Jan Behrens <jan.behrens.n4272.expires-2008-06@flexiguided.de> 1.97 +http://www.flexiguided.de/publications.utf8proc.en.html 1.98 +