utf8proc

annotate lump.txt @ 3:4ee0d5f54af1

Version 1.0

- added the LUMP option, which lumps certain characters together (see lump.txt) (also used for the PostgreSQL "unifold" function)
- added the STRIPMARK option, which strips marking characters (or marks of composed characters)
- deprecated ruby method String#char_ary in favour of String#utf8chars
author jbe
date Sun Sep 17 12:00:00 2006 +0200 (2006-09-17)
parents
children
rev   line source
jbe@3 1 U+0020 <-- all space characters (general category Zs)
jbe@3 2 U+0027 ' <-- left/right single quotation mark U+2018..2019,
jbe@3 3 modifier letter apostrophe U+02BC,
jbe@3 4 modifier letter vertical line U+02C8
jbe@3 5 U+002D - <-- all dash characters (general category Pd),
jbe@3 6 minus U+2212
jbe@3 7 U+002F / <-- fraction slash U+2044,
jbe@3 8 division slash U+2215
jbe@3 9 U+003A : <-- ratio U+2236
jbe@3 10 U+003C < <-- single left-pointing angle quotation mark U+2039,
jbe@3 11 left-pointing angle bracket U+2329,
jbe@3 12 left angle bracket U+3008
jbe@3 13 U+003E > <-- single right-pointing angle quotation mark U+203A,
jbe@3 14 right-pointing angle bracket U+232A,
jbe@3 15 right angle bracket U+3009
jbe@3 16 U+005C \ <-- set minus U+2216
jbe@3 17 U+005E ^ <-- modifier letter up arrowhead U+02C4,
jbe@3 18 modifier letter circumflex accent U+02C6,
jbe@3 19 caret U+2038,
jbe@3 20 up arrowhead U+2303
jbe@3 21 U+005F _ <-- all connector characters (general category Pc),
jbe@3 22 modifier letter low macron U+02CD
jbe@3 23 U+0060 ` <-- modifier letter grave accent U+02CB
jbe@3 24 U+007C | <-- divides U+2223
jbe@3 25 U+007E ~ <-- tilde operator U+223C
jbe@3 26

Impressum / About Us