jbe@0: 
jbe@0: Please read the LICENSE file, which is shipping with this software.
jbe@0: 
jbe@0: 
jbe@0: *** QUICK START ***
jbe@0: 
jbe@0: For compilation of the C library call "make c-library", for compilation of
jbe@0: the ruby library call "make ruby-library" and for compilation of the
jbe@0: PostgreSQL extension call "make pgsql-library".
jbe@0: 
jbe@0: "make all" can be used to build everything, but both ruby and PostgreSQL
jbe@0: installations are required in this case.
jbe@0: 
jbe@4: For ruby there is alternatively provided a gem-file "utf8proc-1.0.1.gem".
jbe@4: 
jbe@0: 
jbe@0: *** GENERAL INFORMATION ***
jbe@0: 
jbe@0: The C library is found in this directory after successful compilation and is
jbe@0: named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of the
jbe@0: files "utf8proc.rb" and "utf8proc_native.so", which are found in the
jbe@0: subdirectory "ruby/". The PostgreSQL extension is named "utf8proc_pgsql.so"
jbe@0: and resides in the "pgsql/" directory.
jbe@0: 
jbe@0: Both the ruby library and the PostgreSQL extension are built as stand-alone
jbe@0: libraries and are therefore not dependent the dynamic version of the
jbe@0: C library files, but this behaviour might change in future releases.
jbe@0: 
jbe@2: The Unicode version being supported is 5.0.0.
jbe@2: Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as version 5.0.0
jbe@2:       had not been available yet.
jbe@0: 
jbe@0: For Unicode normalizations, the following options have to be used:
jbe@0: Normalization Form C:  STABLE, COMPOSE
jbe@2: Normalization Form D:  STABLE, DECOMPOSE
jbe@0: Normalization Form KC: STABLE, COMPOSE, COMPAT
jbe@2: Normalization Form KD: STABLE, DECOMPOSE, COMPAT
jbe@0: 
jbe@0: 
jbe@0: *** C LIBRARY ***
jbe@0: 
jbe@0: The documentation for the C library is found in the utf8proc.h header file.
jbe@0: "utf8proc_map" is most likely function you will be using for mapping UTF-8
jbe@0: strings, unless you want to allocate memory yourself.
jbe@0: 
jbe@0: 
jbe@0: *** RUBY API ***
jbe@0: 
jbe@0: The ruby library adds the methods "utf8map" and "utf8map!" to the String
jbe@0: class, and the method "utf8" to the Integer class.
jbe@0: 
jbe@0: The String#utf8map method does the same as the "utf8proc_map" C function.
jbe@0: Options for the mapping procedure are passed as symbols, i.e:
jbe@2: "Hello".utf8map(:casefold) => "hello"
jbe@0: 
jbe@0: The descriptions of all options are found in the C header file "utf8proc.h".
jbe@0: Please notice that the according symbols in ruby are all lowercase.
jbe@0: 
jbe@0: String#utf8map! is the destructive function in the meaning that the string
jbe@0: is replaced by the result.
jbe@0: 
jbe@0: There are shortcuts for the 4 normalization forms specified by Unicode:
jbe@0: String#utf8nfd,  String#utf8nfd!,
jbe@0: String#utf8nfc,  String#utf8nfc!,
jbe@0: String#utf8nfkd, String#utf8nfkd!,
jbe@0: String#utf8nfkc, String#utf8nfkc!
jbe@0: 
jbe@0: The method Integer#utf8 returns a UTF-8 string, which is containing the
jbe@2: unicode char given by the code point. 
jbe@0: 0x000A.utf8 => "\n"
jbe@0: 0x2028.utf8 => "\342\200\250"
jbe@0: 
jbe@0: 
jbe@0: *** POSTGRESQL API ***
jbe@0: 
jbe@0: For PostgreSQL there is a SQL function supplied named "unifold". This
jbe@0: function can be used to prepare index fields in order to be normalized and
jbe@0: case-folded, i.e.:
jbe@0: 
jbe@1: CREATE TABLE people (
jbe@1:   id    serial8 primary key,
jbe@1:   name  text,
jbe@1:   CHECK (unifold(name) NOTNULL)
jbe@1: );
jbe@0: CREATE INDEX name_idx ON people (unifold(name));
jbe@0: SELECT * FROM people WHERE unifold(name) = unifold('John Doe');
jbe@0: 
jbe@2: NOTICE: The outputs of the function can change between releases, as utf8proc
jbe@2:         does not follow a versioning stability policy. You have to rebuild
jbe@2:         your database indicies, if you upgrade to a newer version of utf8proc.
jbe@2: 
jbe@2: 
jbe@0: *** TODO ***
jbe@0: 
jbe@0: - detect stable code points and process segments independently in order to
jbe@0:   save memory
jbe@0: - do a quick check before normalizing strings to optimize speed
jbe@0: - support stream processing
jbe@0: 
jbe@0: 
jbe@0: Unicode is a trademark of Unicode, Inc., and may be registered in some
jbe@0: jurisdictions.
jbe@0: 
jbe@0: