pgLatLon v0.14 documentation
============================
pgLatLon is a spatial database extension for the PostgreSQL object-relational
database management system providing geographic data types and spatial indexing
for the WGS-84 spheroid.
While many other spatial databases still use imprecise bounding boxes for
many operations, pgLatLon aims to support more precise calculations for all
implemented geographic operators. Efficient indexing of geographic objects
is provided using space-filling fractal curves. Optimizations on bit level
(including logarithmic compression) allow for a highly memory-efficient
non-overlapping index suitable for huge datasets.
pgLatLon is a lightweight solution as it only depends on PostgreSQL itself (and
a C compiler for building).
Unlike competing spatial extensions for PostgreSQL, pgLatLon is available under
the permissive MIT/X11 license to avoid problems with viral licenses like the
GPLv2/v3.
Installation
------------
### Automatic installation
Prerequisites:
* Ensure that the `pg_config` binary is in your path (shipped with PostgreSQL).
* Ensure that GNU Make is available (either as `make` or `gmake`).
Then simply type:
make install
### Manual installation
It is also possible to compile and install the extension without GNU Make as
follows:
cc -Wall -O2 -fPIC -shared -I `pg_config --includedir-server` -o latlon-v0009.so latlon-v0009.c
cp latlon-v0009.so `pg_config --pkglibdir`
cp latlon.control `pg_config --sharedir`/extension/
cp latlon--*.sql `pg_config --sharedir`/extension/
### Loading the extension
After installation, you can create a database and load the extension as
follows:
% createdb test_database
% psql test_database
psql (9.5.4)
Type "help" for help.
test_database=# CREATE EXTENSION latlon;
### Updating
Before updating your database cluster to a new version of pgLatLon, you may
want to uninstall the old by calling "`make uninstall`" in the unpacked source
code directory of your old pgLatLon version. You may also manually delete the
`latlon-v????.so` files from your PostgreSQL library directory and the
`latlon.control` and `latlon--*.sql` files from your PostgreSQL extension
directory.
The new version can be installed as described above. For altering an existing
database to use the installed new version (mandatory if you removed the old
version), execute the following SQL command in the respective databases:
ALTER EXTENSION latlon UPDATE;
If the update contains modifications to operator classes, it may be necessary
to drop all indices on geographic data types first (you will get an error
message in this case). These indices can be re-created after the update.
Note that taking several update steps at once (e.g. updating from version 0.2
directly to version 0.4) requires the intermediate versions to be installed
(i.e. in this example version 0.3 would need to be installed). Whenever you
install or uninstall an intermediate or old version, make sure to afterwards
re-install the latest pgLatLon version to ensure that the `latlon.control` file
is available and points to the latest version.
If the update contains modifications to the internal data representation
format, an update path might not be available. In this case, create a dump of
your database, delete your database, and restore it from your dump.
Be sure to always keep backups of all your data before attempting to update.
Reference
---------
### 1. Types
pgLatLon provides four geographic types: `epoint`, `ebox`, `ecircle`, and
`ecluster`.
#### `epoint`
A point on the Earth spheroid (WGS-84).
The text input format is `'[N|S] [E|W]'`, where each float is in
degrees. Note the required white space between the latitude and longitude
components. Each floating point number may have a sign, in which case `N`/`S`
or `E`/`W` are switched respectively (e.g. `E-5` is the same as `W5`).
An `epoint` may also be created from two floating point numbers by calling
`epoint(latitude, longitude)`, where positive latitudes are used for the
northern hemisphere, negative latitudes are used for the southern hemisphere,
positive longitudes indicate positions east of the prime meridian, and negative
longitudes indicate positions west of the prime meridian.
Latitudes exceeding -90 or +90 degrees are truncated to -90 or +90
respectively, in which case a warning will be issued. Longitudes exceeding -180
or +180 degrees will be converted to values between -180 and +180 (both
inclusive) by adding or substracting a multiple of 360 degrees, in which case a
notice will be issued.
If the latitude is -90 or +90 (south pole or north pole), a longitude value is
still stored in the datum, and if a point is on the prime meridian or the
180th meridian, the east/west bit is also stored in the datum. In case of the
prime meridian, this is done by storing a floating point value of -0 for
0 degrees west and a value of +0 for 0 degrees east. In case of the
180th meridian, this is done by storing -180 or +180 respectively. The equality
operator, however, returns true when the same points on Earth are described,
i.e. the longitude is ignored for the poles, and 180 degrees west is considered
to be equal to 180 degrees east.
#### `ebox`
An area on Earth demarcated by a southern and northern latitude, and a western
and eastern longitude (all given in WGS-84).
The text input format is
`'{N|S} {E|W} {N|S} {E|W}'`, where each float is in
degrees. The ordering of the four white-space separated blocks is not
significant. To include the 180th meridian, one longitude boundary must be
equal to or exceed `W180` or `E180`, e.g. `'N10 N20 E170 E190'`.
A special value is the empty area, denoted by the text represenation `'empty'`.
Such an `ebox` does not contain any point.
An `ebox` may also be created from four floating point numbers by calling
`ebox(min_latitude, max_latitude, min_longitude, max_longitude)`, where
positive values are used for north and east, and negative values are used for
south and west. If `min_latitude` is strictly greater than `max_latitude`, an
empty `ebox` is created. If `min_longitude` is greater than `max_longitude` and
if both longitudes are between -180 and +180 degrees, then the area oriented in
such way that the 180th meridian is included.
If the longitude span is less than 120 degrees, an `ebox` may be alternatively
created from two `epoints` in the following way: `ebox(epoint(lat1, lon1),
epoint(lat2, lon2))`. In this case `lat1` and `lat2` as well as `lon1` and
`lon2` can be swapped without any impact.
#### `ecircle`
An area containing all points not farther away from a given center point
(WGS-84) than a given radius.
The text input format is `'{N|S} {E|W} '`, where the first
two floats denote the center point in degrees and the third float denotes the
radius in meters. A radius equal to minus infinity denotes an empty circle
which contains no point at all (despite having a center), while a radius equal
to zero denotes a circle that includes a single point.
An `ecircle` may also be created by calling `ecircle(epoint(...), radius)` or
from three floating point numbers by calling `ecircle(latitude, longitude,
radius)`.
#### `ecluster`
A collection of points, paths, polygons, and outlines on the WGS-84 spheroid.
Each path, polygon, or outline must cover a longitude range of less than
180 degrees to avoid ambiguities.
The text input format is a white-space separated list of the following items:
* `point ({N|S} {E|W})`
* `path ({N|S} {E|W} {N|S} {E|W} ...)`
* `outline ({N|S} {E|W} {N|S} {E|W} {N|S} {E|W} ...)`
* `polygon ({N|S} {E|W} {N|S} {E|W} {N|S} {E|W} ...)`
Paths are open by default (i.e. there is no connection from the last point in
the list to the first point in the list). Outlines and polygons, in contrast,
are automatically closed (i.e. there is a line segment from the last point in
the list to the first point in the list) which means the first point should not
be repeated as last point in the list. Polygons are filled, outlines are not.
### 2. Indices
Two kinds of indices are supported: B-tree and GiST indices.
#### B-tree indices
A B-tree index can be used for simple equality searches and is supported by the
`epoint`, `ebox`, and `ecircle` data types. B-tree indices can not be used for
geographic searches.
#### GiST indices
For geographic searches, GiST indices must be used. The `epoint`, `ecircle`,
and `ecluster` data types support GiST indexing. A GiST index for geographic
searches can be created as follows:
CREATE TABLE tbl (
id serial4 PRIMARY KEY,
loc epoint NOT NULL );
CREATE INDEX name_of_index ON tbl USING gist (loc);
GiST indices also support nearest neighbor searches when using the distance
operator (`<->`) in the ORDER BY clause.
#### Indices on other data types (e.g. GeoJSON)
Note that further types can be indexed by using an index on an expression with
a conversion function. One conversion function provided by pgLatLon is the
`GeoJSON_to_ecluster(jsonb, text)` function:
CREATE TABLE tbl (
id serial4 PRIMARY KEY,
loc jsonb NOT NULL );
CREATE INDEX name_of_index ON tbl USING gist ((GeoJSON_to_ecluster("loc")));
When using the conversion function in an expression, the index will be used
automatically:
SELECT * FROM tbl WHERE GeoJSON_to_ecluster("loc") && 'N50 E10 10000'::ecircle;
### 3. Operators
#### Equality operator `=`
Tests if two geographic objects are equal.
The longitude is ignored for the poles, and 180 degrees west is considered to
be equal to 180 degrees east.
For boxes and circles, two empty objects are considered equal. (Note that a
circle is not empty if the radius is zero but only if it is negative infinity,
i.e. smaller than zero.) Two circles with a positive infinite radius are also
considered equal.
Implemented for:
* `epoint = epoint`
* `ebox = ebox`
* `ecircle = ecircle`
The negation is the inequality operator (`<>` or `!=`).
#### Linear ordering operators `<<<`, `<<<=`, `>>>=`, `>>>`
These operators create an arbitrary (but well-defined) linear ordering of
geographic objects, which is used internally for B-tree indexing and merge
joins. These operators will usually not be used by an application programmer.
#### Overlap operator `&&`
Tests if two geographic objects have at least one point in common. Currently
implemented for:
* `epoint && ebox`
* `epoint && ecircle`
* `epoint && ecluster`
* `ebox && ebox`
* `ebox && ecircle`
* `ebox && ecluster`
* `ecircle && ecircle`
* `ecircle && ecluster`
* `ecluster && ecluster`
The `&&` operator is commutative, i.e. "`a && b`" is the same as "`b && a`".
Each commutation is supported as well.
#### Lossy overlap operator `&&+`
Tests if two geographic objects may have at least one point in common. Opposed
to the `&&` operator, the `&&+` operator may return false positives and is
currently implemented for:
* `epoint &&+ ecluster`
* `ebox &&+ ecircle`
* `ebox &&+ ecluster`
* `ecircle &&+ ecluster`
* `ecluster &&+ ecluster`
The `&&+` operator is commutative, i.e. "`a &&+ b`" is the same as "`b &&+ a`".
Each commutation is supported as well.
Where two data types support both the `&&` and the `&&+` operator, the `&&+`
operator computes faster.
#### Contains operator `@>`
Tests if the object right of the operator is contained in the object left of
the operator. Currently implemented for:
* `ebox @> epoint` (alias for `&&`)
* `ebox @> ebox`
* `ebox @> ecluster`
* `ecluster @> epoint` (alias for `&&`)
* `ecluster @> ebox`
* `ecluster @> ecluster`
The commutator of `@>` ("contains") is `<@` ("is contained in"), i.e.
"`a @> b`" is the same as "`b <@ a`".
Whether the perimeter of an object is taken into account is undefined and may
differ between the left and the right hand side of the operator. The current
implementation (where not an alias for `&&`) returns true only if an object is
contained completely within the other object, not touching its perimeter,
paths, outlines, or any singular points.
#### Distance operator `<->`
Calculates the shortest distance between two geographic objects in meters (zero
if the objects are overlapping). Currently implemented for:
* `epoint <-> epoint`
* `epoint <-> ebox`
* `epoint <-> ecircle`
* `epoint <-> ecluster`
* `ebox <-> ebox`
* `ebox <-> ecircle`
* `ebox <-> ecluster`
* `ecircle <-> ecircle`
* `ecircle <-> ecluster`
* `ecluster <-> ecluster`
The `<->` operator is commutative, i.e. "`a <-> b`" is the same as "`b <-> a`".
Each commutation is supported as well.
For short distances, the result is very accurate (i.e. respects the dimensions
of the WGS-84 spheroid). For longer distances in the order of magnitude of
Earth's radius or greater, the value is only approximate (but the error is
still less than 0.2% as long as no polygons with very long edges are involved).
The functions `distance(epoint, epoint)` and `distance(ecluster, epoint)` can
be used as an alias for this operator.
Note: In case of radial searches with a fixed radius, this operator should
not be used. Instead, an `ecircle` should be created and used in combination
with the overlap operator (`&&`). Alternatively, the functions
`distance_within(epoint, epoint, float8)` or `distance_within(ecluster, epoint,
float8)` can be used for fixed-radius searches.
### 4. Functions
#### `center(circle)`
Returns the center of an `ecircle` as an `epoint`.
#### `distance(epoint, epoint)`
Calculates the distance between two `epoint` datums in meters. This function is
an alias for the distance operator `<->`.
Note: In case of radial searches with a fixed radius, this function should not be
used. Use `distance_within(epoint, epoint, float8)` instead.
#### `distance(ecluster, epoint)`
Calculates the distance from an `ecluster` to an `epoint` in meters. This
function is an alias for the distance operator `<->`.
Note: In case of radial searches with a fixed radius, this function should not be
used. Use `distance_within(epoint, epoint, float8)` instead.
#### `distance_within(`variable `epoint,` fixed `epoint,` radius `float8)`
Checks if the distance between two `epoint` datums is not greater than a given
value (search radius).
Note: In case of radial searches with a fixed radius, the first argument must
be used for the table column, while the second argument must be used for the
search center. Otherwise an existing index cannot be used.
#### `distance_within(`variable `ecluster,` fixed `epoint,` radius `float8)`
Checks if the distance from an `ecluster` to an `epoint` is not greater than a
given value (search radius).
#### `ebox(`latmin `float8,` latmax `float8,` lonmin `float8,` lonmax `float8)`
Creates a new `ebox` with the given boundaries.
See "1. Types", subsection `ebox` for details.
#### `ebox(epoint, epoint)`
Creates a new `ebox`. This function may only be used if the longitude
difference is less than or equal to 120 degrees.
See "1. Types", subsection `ebox` for details.
#### `ecircle(epoint, float8)`
Creates an `ecircle` with the given center point and radius.
#### `ecircle(`latitude `float8,` longitude `float8,` radius `float8)`
Creates an `ecircle` with the given center point and radius.
#### `ecluster_concat(ecluster, ecluster)`
Combines two clusters to form a new `ecluster` by uniting all entries of both
clusters. Note that two overlapping areas of polygons annihilate each other
(which may be used to create polygons with holes).
#### `ecluster_concat(ecluster[])`
Creates a new `ecluster` that unites all entries of all clusters in the passed
array. Note that two overlapping areas of polygons annihilate each other (which
may be used to create polygons with holes).
#### `ecluster_create_multipoint(epoint[])`
Creates a new `ecluster` which contains multiple points.
#### `ecluster_create_outline(epoint[])`
Creates a new `ecluster` that is an outline given by the passed points.
#### `ecluster_create_path(epoint[])`
Creates a new `ecluster` that is a path given by the passed points.
#### `ecluster_create_polygon(epoint[])`
Creates a new `ecluster` that is a polygon given by the passed points.
#### `ecluster_extract_outlines(ecluster)`
Set-returning function that returns the outlines of an `ecluster` as `epoint[]`
rows.
#### `ecluster_extract_paths(ecluster)`
Set-returning function that returns the paths of an `ecluster` as `epoint[]`
rows.
#### `ecluster_extract_points(ecluster)`
Set-returning function that returns the points of an `ecluster` as `epoint`
rows.
#### `ecluster_extract_polygons(ecluster)`
Set-returning function that returns the polygons of an `ecluster` as `epoint[]`
rows.
#### `empty_ebox`()
Returns the empty `ebox`.
See "1. Types", subsection `ebox` for details.
#### `epoint(`latitude `float8,` longitude `float8)`
Returns an `epoint` with the given latitude and longitude.
#### `epoint_latlon(`latitude `float8,` longitude `float8)`
Alias for `epoint(float8, float8)`.
#### `epoint_lonlat(`longitude `float8,` latitude `float8)`
Same as `epoint(float8, float8)` but with arguments reversed.
#### `fair_distance(ecluster, epoint,` samples `int4 = 10000)`
When working with user-generated content, users may be tempted to create
intentionally oversized objects in order to optimize search results in an
unfair manner. The `fair_distance` function aims to handle this by returning an
adjusted distance (i.e. distance increased by a penalty) if a geographic object
(the `ecluster`) consists of more than one point.
The first argument to this function is an `ecluster`, the second argument is a
search point (`epoint`), and the third argument is an interger related to the
precision (higher precision will require more computation time).
The penalty by which the returned distance is increased fulfills (at least) the
following properties:
* The penalty function is continuous (except noise created by numerical
integration, see paragraph after this list) as long as no objects are added
to or removed from the `ecluster`. That particularly means: small changes in
the search point (second argument) cause only small changes in the result.
* For search points far away from the `ecluster` (i.e. large distances compared
to the dimensions of the `ecluster`), the penalty approaches zero, i.e. the
behavior of the `fair_distance` function approaches the behavior of the
`distance` function.
* If the `ecluster` consists of a set of points, the penalty for a search point
close to one of those points (closer than half of the minimum distance
between each pair of points in the `ecluster`) is chosen in such a way that
the adjusted distance is equal to the distance from the search point to the
closest point in the `ecluster` multiplied by the square root of the count of
points in the `ecluster`.
* If the `ecluster` does not cover any area (i.e. only consists of points,
paths, and/or outlines), and if the search point (second argument) overlaps
with the `ecluster`, then the penalty (and thus the result) is zero.
* The integral (or average) of the square of the fair distance value (result of
this function) over all possible search points is independent of the
`ecluster` as long as the `ecluster` does not cover more than a half of
earth's surface.
The function uses numerical integration to compute the result. The third
parameter (which defaults to 10000) can be used to adjust the number of samples
taken. A higher sample count increases precision as well as execution time of
the function. Because this function internally uses a spherical model of earth
for certain steps of the calculation, the precision cannot be increased
unboundedly.
Despite the limitations explained above, it is ensured that the penalty is
always positive, i.e. results returned by the `fair_distance` function are
always equal to or greater than the results returned by the `distance`
function regardless of stochastic effects. Furthermore, all results are
deterministic and reproducible with the same version of pgLatLon.
#### `GeoJSON_to_epoint(jsonb, text)`
Maps a GeoJSON object of type "Point" or "Feature" (which contains a
"Point") to an `epoint` datum. For any other JSON objects, NULL is returned.
The second parameter (which defaults to `epoint_lonlat`) may be set to a name
of a conversion function that transforms two coordinates (two `float8`
parameters) to an `epoint`.
#### `GeoJSON_to_ecluster(jsonb, text)`
Maps a (valid) GeoJSON object to an `ecluster`. Note that this function
does not check whether the JSONB object is a valid GeoJSON object.
The second parameter (which defaults to `epoint_lonlat`) may be set to a name
of a conversion function that transforms two coordinates (two `float8`
parameters) to an `epoint`.
#### `max_latitude(ebox)`
Returns the northern boundary of a given `ebox` in degrees between -90 and +90.
#### `max_longitude(ebox)`
Returns the eastern boundary of a given `ebox` in degrees between -180 and +180
(both inclusive).
#### `min_latitude(ebox)`
Returns the southern boundary of a given `ebox` in degrees between -90 and +90.
#### `min_longitude(ebox)`
Returns the western boundary of a given `ebox` in degrees between -180 and +180
(both inclusive).
#### `latitude(epoint)`
Returns the latitude value of an `epoint` in degrees between -90 and +90.
#### `longitude(epoint)`
Returns the longitude value of an `epoint` in degrees between -180 and +180
(both inclusive).
#### `radius(ecircle)`
Returns the radius of an `ecircle` in meters.