pgLatLon: 9da49e9278ac README.mkd

pgLatLon

view README.mkd @ 67:9da49e9278ac

Include files for new update path in GNUmakefile

author	jbe
date	Tue Feb 11 02:27:50 2020 +0100 (2020-02-11)
parents	ce0963692318
children	76b3fd3293fc

line source

1 pgLatLon v0.14 documentation

2 ============================

4 pgLatLon is a spatial database extension for the PostgreSQL object-relational

5 database management system providing geographic data types and spatial indexing

6 for the WGS-84 spheroid.

8 While many other spatial databases still use imprecise bounding boxes for

9 many operations, pgLatLon aims to support more precise calculations for all

10 implemented geographic operators. Efficient indexing of geographic objects

11 is provided using space-filling fractal curves. Optimizations on bit level

12 (including logarithmic compression) allow for a highly memory-efficient

13 non-overlapping index suitable for huge datasets.

15 pgLatLon is a lightweight solution as it only depends on PostgreSQL itself (and

16 a C compiler for building).

18 Unlike competing spatial extensions for PostgreSQL, pgLatLon is available under

19 the permissive MIT/X11 license to avoid problems with viral licenses like the

20 GPLv2/v3.

23 Installation

24 ------------

26 ### Automatic installation

28 Prerequisites:

30 * Ensure that the `pg_config` binary is in your path (shipped with PostgreSQL).

31 * Ensure that GNU Make is available (either as `make` or `gmake`).

33 Then simply type:

35 make install

37 ### Manual installation

39 It is also possible to compile and install the extension without GNU Make as

40 follows:

42 cc -Wall -O2 -fPIC -shared -I `pg_config --includedir-server` -o latlon-v0009.so latlon-v0009.c

43 cp latlon-v0009.so `pg_config --pkglibdir`

44 cp latlon.control `pg_config --sharedir`/extension/

45 cp latlon--*.sql `pg_config --sharedir`/extension/

47 ### Loading the extension

49 After installation, you can create a database and load the extension as

50 follows:

52 % createdb test_database

53 % psql test_database

54 psql (9.5.4)

55 Type "help" for help.

57 test_database=# CREATE EXTENSION latlon;

59 ### Updating

61 Before updating your database cluster to a new version of pgLatLon, you may

62 want to uninstall the old by calling "`make uninstall`" in the unpacked source

63 code directory of your old pgLatLon version. You may also manually delete the

64 `latlon-v????.so` files from your PostgreSQL library directory and the

65 `latlon.control` and `latlon--*.sql` files from your PostgreSQL extension

66 directory.

68 The new version can be installed as described above. For altering an existing

69 database to use the installed new version (mandatory if you removed the old

70 version), execute the following SQL command in the respective databases:

72 ALTER EXTENSION latlon UPDATE;

74 If the update contains modifications to operator classes, it may be necessary

75 to drop all indices on geographic data types first (you will get an error

76 message in this case). These indices can be re-created after the update.

78 Note that taking several update steps at once (e.g. updating from version 0.2

79 directly to version 0.4) requires the intermediate versions to be installed

80 (i.e. in this example version 0.3 would need to be installed). Whenever you

81 install or uninstall an intermediate or old version, make sure to afterwards

82 re-install the latest pgLatLon version to ensure that the `latlon.control` file

83 is available and points to the latest version.

85 If the update contains modifications to the internal data representation

86 format, an update path might not be available. In this case, create a dump of

87 your database, delete your database, and restore it from your dump.

89 Be sure to always keep backups of all your data before attempting to update.

92 Reference

93 ---------

95 ### 1. Types

97 pgLatLon provides four geographic types: `epoint`, `ebox`, `ecircle`, and

98 `ecluster`.

100 #### `epoint`

101

102 A point on the Earth spheroid (WGS-84).

103

104 The text input format is `'[N|S]<float> [E|W]<float>'`, where each float is in

105 degrees. Note the required white space between the latitude and longitude

106 components. Each floating point number may have a sign, in which case `N`/`S`

107 or `E`/`W` are switched respectively (e.g. `E-5` is the same as `W5`).

108

109 An `epoint` may also be created from two floating point numbers by calling

110 `epoint(latitude, longitude)`, where positive latitudes are used for the

111 northern hemisphere, negative latitudes are used for the southern hemisphere,

112 positive longitudes indicate positions east of the prime meridian, and negative

113 longitudes indicate positions west of the prime meridian.

114

115 Latitudes exceeding -90 or +90 degrees are truncated to -90 or +90

116 respectively, in which case a warning will be issued. Longitudes exceeding -180

117 or +180 degrees will be converted to values between -180 and +180 (both

118 inclusive) by adding or substracting a multiple of 360 degrees, in which case a

119 notice will be issued.

120

121 If the latitude is -90 or +90 (south pole or north pole), a longitude value is

122 still stored in the datum, and if a point is on the prime meridian or the

123 180th meridian, the east/west bit is also stored in the datum. In case of the

124 prime meridian, this is done by storing a floating point value of -0 for

125 0 degrees west and a value of +0 for 0 degrees east. In case of the

126 180th meridian, this is done by storing -180 or +180 respectively. The equality

127 operator, however, returns true when the same points on Earth are described,

128 i.e. the longitude is ignored for the poles, and 180 degrees west is considered

129 to be equal to 180 degrees east.

130

131 #### `ebox`

132

133 An area on Earth demarcated by a southern and northern latitude, and a western

134 and eastern longitude (all given in WGS-84).

135

136 The text input format is

137 `'{N|S}<float> {E|W}<float> {N|S}<float> {E|W}<float>'`, where each float is in

138 degrees. The ordering of the four white-space separated blocks is not

139 significant. To include the 180th meridian, one longitude boundary must be

140 equal to or exceed `W180` or `E180`, e.g. `'N10 N20 E170 E190'`.

141

142 A special value is the empty area, denoted by the text represenation `'empty'`.

143 Such an `ebox` does not contain any point.

144

145 An `ebox` may also be created from four floating point numbers by calling

146 `ebox(min_latitude, max_latitude, min_longitude, max_longitude)`, where

147 positive values are used for north and east, and negative values are used for

148 south and west. If `min_latitude` is strictly greater than `max_latitude`, an

149 empty `ebox` is created. If `min_longitude` is greater than `max_longitude` and

150 if both longitudes are between -180 and +180 degrees, then the area oriented in

151 such way that the 180th meridian is included.

152

153 If the longitude span is less than 120 degrees, an `ebox` may be alternatively

154 created from two `epoints` in the following way: `ebox(epoint(lat1, lon1),

155 epoint(lat2, lon2))`. In this case `lat1` and `lat2` as well as `lon1` and

156 `lon2` can be swapped without any impact.

157

158 #### `ecircle`

159

160 An area containing all points not farther away from a given center point

161 (WGS-84) than a given radius.

162

163 The text input format is `'{N|S}<float> {E|W}<float> <float>'`, where the first

164 two floats denote the center point in degrees and the third float denotes the

165 radius in meters. A radius equal to minus infinity denotes an empty circle

166 which contains no point at all (despite having a center), while a radius equal

167 to zero denotes a circle that includes a single point.

168

169 An `ecircle` may also be created by calling `ecircle(epoint(...), radius)` or

170 from three floating point numbers by calling `ecircle(latitude, longitude,

171 radius)`.

172

173 #### `ecluster`

174

175 A collection of points, paths, polygons, and outlines on the WGS-84 spheroid.

176 Each path, polygon, or outline must cover a longitude range of less than

177 180 degrees to avoid ambiguities.

178

179 The text input format is a white-space separated list of the following items:

180

181 * `point ({N|S}<float> {E|W}<float>)`

182 * `path ({N|S}<float> {E|W}<float> {N|S}<float> {E|W}<float> ...)`

183 * `outline ({N|S}<float> {E|W}<float> {N|S}<float> {E|W}<float> {N|S}<float> {E|W}<float> ...)`

184 * `polygon ({N|S}<float> {E|W}<float> {N|S}<float> {E|W}<float> {N|S}<float> {E|W}<float> ...)`

185

186 Paths are open by default (i.e. there is no connection from the last point in

187 the list to the first point in the list). Outlines and polygons, in contrast,

188 are automatically closed (i.e. there is a line segment from the last point in

189 the list to the first point in the list) which means the first point should not

190 be repeated as last point in the list. Polygons are filled, outlines are not.

191

192 ### 2. Indices

193

194 Two kinds of indices are supported: B-tree and GiST indices.

195

196 #### B-tree indices

197

198 A B-tree index can be used for simple equality searches and is supported by the

199 `epoint`, `ebox`, and `ecircle` data types. B-tree indices can not be used for

200 geographic searches.

201

202 #### GiST indices

203

204 For geographic searches, GiST indices must be used. The `epoint`, `ecircle`,

205 and `ecluster` data types support GiST indexing. A GiST index for geographic

206 searches can be created as follows:

207

208 CREATE TABLE tbl (

209 id serial4 PRIMARY KEY,

210 loc epoint NOT NULL );

211

212 CREATE INDEX name_of_index ON tbl USING gist (loc);

213

214 GiST indices also support nearest neighbor searches when using the distance

215 operator (`<->`) in the ORDER BY clause.

216

217 #### Indices on other data types (e.g. GeoJSON)

218

219 Note that further types can be indexed by using an index on an expression with

220 a conversion function. One conversion function provided by pgLatLon is the

221 `GeoJSON_to_ecluster(jsonb, text)` function:

222

223 CREATE TABLE tbl (

224 id serial4 PRIMARY KEY,

225 loc jsonb NOT NULL );

226

227 CREATE INDEX name_of_index ON tbl USING gist ((GeoJSON_to_ecluster("loc")));

228

229 When using the conversion function in an expression, the index will be used

230 automatically:

231

232 SELECT * FROM tbl WHERE GeoJSON_to_ecluster("loc") && 'N50 E10 10000'::ecircle;

233

234 ### 3. Operators

235

236 #### Equality operator `=`

237

238 Tests if two geographic objects are equal.

239

240 The longitude is ignored for the poles, and 180 degrees west is considered to

241 be equal to 180 degrees east.

242

243 For boxes and circles, two empty objects are considered equal. (Note that a

244 circle is not empty if the radius is zero but only if it is negative infinity,

245 i.e. smaller than zero.) Two circles with a positive infinite radius are also

246 considered equal.

247

248 Implemented for:

249

250 * `epoint = epoint`

251 * `ebox = ebox`

252 * `ecircle = ecircle`

253

254 The negation is the inequality operator (`<>` or `!=`).

255

256 #### Linear ordering operators `<<<`, `<<<=`, `>>>=`, `>>>`

257

258 These operators create an arbitrary (but well-defined) linear ordering of

259 geographic objects, which is used internally for B-tree indexing and merge

260 joins. These operators will usually not be used by an application programmer.

261

262 #### Overlap operator `&&`

263

264 Tests if two geographic objects have at least one point in common. Currently

265 implemented for:

266

267 * `epoint && ebox`

268 * `epoint && ecircle`

269 * `epoint && ecluster`

270 * `ebox && ebox`

271 * `ebox && ecircle`

272 * `ebox && ecluster`

273 * `ecircle && ecircle`

274 * `ecircle && ecluster`

275 * `ecluster && ecluster`

276

277 The `&&` operator is commutative, i.e. "`a && b`" is the same as "`b && a`".

278 Each commutation is supported as well.

279

280 #### Lossy overlap operator `&&+`

281

282 Tests if two geographic objects may have at least one point in common. Opposed

283 to the `&&` operator, the `&&+` operator may return false positives and is

284 currently implemented for:

285

286 * `epoint &&+ ecluster`

287 * `ebox &&+ ecircle`

288 * `ebox &&+ ecluster`

289 * `ecircle &&+ ecluster`

290 * `ecluster &&+ ecluster`

291

292 The `&&+` operator is commutative, i.e. "`a &&+ b`" is the same as "`b &&+ a`".

293 Each commutation is supported as well.

294

295 Where two data types support both the `&&` and the `&&+` operator, the `&&+`

296 operator computes faster.

297

298 #### Contains operator `@>`

299

300 Tests if the object right of the operator is contained in the object left of

301 the operator. Currently implemented for:

302

303 * `ebox @> epoint` (alias for `&&`)

304 * `ebox @> ebox`

305 * `ebox @> ecluster`

306 * `ecluster @> epoint` (alias for `&&`)

307 * `ecluster @> ebox`

308 * `ecluster @> ecluster`

309

310 The commutator of `@>` ("contains") is `<@` ("is contained in"), i.e.

311 "`a @> b`" is the same as "`b <@ a`".

312

313 Whether the perimeter of an object is taken into account is undefined and may

314 differ between the left and the right hand side of the operator. The current

315 implementation (where not an alias for `&&`) returns true only if an object is

316 contained completely within the other object, not touching its perimeter,

317 paths, outlines, or any singular points.

318

319 #### Distance operator `<->`

320

321 Calculates the shortest distance between two geographic objects in meters (zero

322 if the objects are overlapping). Currently implemented for:

323

324 * `epoint <-> epoint`

325 * `epoint <-> ebox`

326 * `epoint <-> ecircle`

327 * `epoint <-> ecluster`

328 * `ebox <-> ebox`

329 * `ebox <-> ecircle`

330 * `ebox <-> ecluster`

331 * `ecircle <-> ecircle`

332 * `ecircle <-> ecluster`

333 * `ecluster <-> ecluster`

334

335 The `<->` operator is commutative, i.e. "`a <-> b`" is the same as "`b <-> a`".

336 Each commutation is supported as well.

337

338 For short distances, the result is very accurate (i.e. respects the dimensions

339 of the WGS-84 spheroid). For longer distances in the order of magnitude of

340 Earth's radius or greater, the value is only approximate (but the error is

341 still less than 0.2% as long as no polygons with very long edges are involved).

342

343 The functions `distance(epoint, epoint)` and `distance(ecluster, epoint)` can

344 be used as an alias for this operator.

345

346 Note: In case of radial searches with a fixed radius, this operator should

347 not be used. Instead, an `ecircle` should be created and used in combination

348 with the overlap operator (`&&`). Alternatively, the functions

349 `distance_within(epoint, epoint, float8)` or `distance_within(ecluster, epoint,

350 float8)` can be used for fixed-radius searches.

351

352 ### 4. Functions

353

354 #### `center(circle)`

355

356 Returns the center of an `ecircle` as an `epoint`.

357

358 #### `distance(epoint, epoint)`

359

360 Calculates the distance between two `epoint` datums in meters. This function is

361 an alias for the distance operator `<->`.

362

363 Note: In case of radial searches with a fixed radius, this function should not be

364 used. Use `distance_within(epoint, epoint, float8)` instead.

365

366 #### `distance(ecluster, epoint)`

367

368 Calculates the distance from an `ecluster` to an `epoint` in meters. This

369 function is an alias for the distance operator `<->`.

370

371 Note: In case of radial searches with a fixed radius, this function should not be

372 used. Use `distance_within(epoint, epoint, float8)` instead.

373

374 #### `distance_within(`variable `epoint,` fixed `epoint,` radius `float8)`

375

376 Checks if the distance between two `epoint` datums is not greater than a given

377 value (search radius).

378

379 Note: In case of radial searches with a fixed radius, the first argument must

380 be used for the table column, while the second argument must be used for the

381 search center. Otherwise an existing index cannot be used.

382

383 #### `distance_within(`variable `ecluster,` fixed `epoint,` radius `float8)`

384

385 Checks if the distance from an `ecluster` to an `epoint` is not greater than a

386 given value (search radius).

387

388 #### `ebox(`latmin `float8,` latmax `float8,` lonmin `float8,` lonmax `float8)`

389

390 Creates a new `ebox` with the given boundaries.

391 See "1. Types", subsection `ebox` for details.

392

393 #### `ebox(epoint, epoint)`

394

395 Creates a new `ebox`. This function may only be used if the longitude

396 difference is less than or equal to 120 degrees.

397 See "1. Types", subsection `ebox` for details.

398

399 #### `ecircle(epoint, float8)`

400

401 Creates an `ecircle` with the given center point and radius.

402

403 #### `ecircle(`latitude `float8,` longitude `float8,` radius `float8)`

404

405 Creates an `ecircle` with the given center point and radius.

406

407 #### `ecluster_concat(ecluster, ecluster)`

408

409 Combines two clusters to form a new `ecluster` by uniting all entries of both

410 clusters. Note that two overlapping areas of polygons annihilate each other

411 (which may be used to create polygons with holes).

412

413 #### `ecluster_concat(ecluster[])`

414

415 Creates a new `ecluster` that unites all entries of all clusters in the passed

416 array. Note that two overlapping areas of polygons annihilate each other (which

417 may be used to create polygons with holes).

418

419 #### `ecluster_create_multipoint(epoint[])`

420

421 Creates a new `ecluster` which contains multiple points.

422

423 #### `ecluster_create_outline(epoint[])`

424

425 Creates a new `ecluster` that is an outline given by the passed points.

426

427 #### `ecluster_create_path(epoint[])`

428

429 Creates a new `ecluster` that is a path given by the passed points.

430

431 #### `ecluster_create_polygon(epoint[])`

432

433 Creates a new `ecluster` that is a polygon given by the passed points.

434

435 #### `ecluster_extract_outlines(ecluster)`

436

437 Set-returning function that returns the outlines of an `ecluster` as `epoint[]`

438 rows.

439

440 #### `ecluster_extract_paths(ecluster)`

441

442 Set-returning function that returns the paths of an `ecluster` as `epoint[]`

443 rows.

444

445 #### `ecluster_extract_points(ecluster)`

446

447 Set-returning function that returns the points of an `ecluster` as `epoint`

448 rows.

449

450 #### `ecluster_extract_polygons(ecluster)`

451

452 Set-returning function that returns the polygons of an `ecluster` as `epoint[]`

453 rows.

454

455 #### `empty_ebox`()

456

457 Returns the empty `ebox`.

458 See "1. Types", subsection `ebox` for details.

459

460 #### `epoint(`latitude `float8,` longitude `float8)`

461

462 Returns an `epoint` with the given latitude and longitude.

463

464 #### `epoint_latlon(`latitude `float8,` longitude `float8)`

465

466 Alias for `epoint(float8, float8)`.

467

468 #### `epoint_lonlat(`longitude `float8,` latitude `float8)`

469

470 Same as `epoint(float8, float8)` but with arguments reversed.

471

472 #### `fair_distance(ecluster, epoint,` samples `int4 = 10000)`

473

474 When working with user-generated content, users may be tempted to create

475 intentionally oversized objects in order to optimize search results in an

476 unfair manner. The `fair_distance` function aims to handle this by returning an

477 adjusted distance (i.e. distance increased by a penalty) if a geographic object

478 (the `ecluster`) consists of more than one point.

479

480 The first argument to this function is an `ecluster`, the second argument is a

481 search point (`epoint`), and the third argument is an interger related to the

482 precision (higher precision will require more computation time).

483

484 The penalty by which the returned distance is increased fulfills (at least) the

485 following properties:

486

487 * The penalty function is continuous (except noise created by numerical

488 integration, see paragraph after this list) as long as no objects are added

489 to or removed from the `ecluster`. That particularly means: small changes in

490 the search point (second argument) cause only small changes in the result.

491 * For search points far away from the `ecluster` (i.e. large distances compared

492 to the dimensions of the `ecluster`), the penalty approaches zero, i.e. the

493 behavior of the `fair_distance` function approaches the behavior of the

494 `distance` function.

495 * If the `ecluster` consists of a set of points, the penalty for a search point

496 close to one of those points (closer than half of the minimum distance

497 between each pair of points in the `ecluster`) is chosen in such a way that

498 the adjusted distance is equal to the distance from the search point to the

499 closest point in the `ecluster` multiplied by the square root of the count of

500 points in the `ecluster`.

501 * If the `ecluster` does not cover any area (i.e. only consists of points,

502 paths, and/or outlines), and if the search point (second argument) overlaps

503 with the `ecluster`, then the penalty (and thus the result) is zero.

504 * The integral (or average) of the square of the fair distance value (result of

505 this function) over all possible search points is independent of the

506 `ecluster` as long as the `ecluster` does not cover more than a half of

507 earth's surface.

508

509 The function uses numerical integration to compute the result. The third

510 parameter (which defaults to 10000) can be used to adjust the number of samples

511 taken. A higher sample count increases precision as well as execution time of

512 the function. Because this function internally uses a spherical model of earth

513 for certain steps of the calculation, the precision cannot be increased

514 unboundedly.

515

516 Despite the limitations explained above, it is ensured that the penalty is

517 always positive, i.e. results returned by the `fair_distance` function are

518 always equal to or greater than the results returned by the `distance`

519 function regardless of stochastic effects. Furthermore, all results are

520 deterministic and reproducible with the same version of pgLatLon.

521

522 #### `GeoJSON_to_epoint(jsonb, text)`

523

524 Maps a GeoJSON object of type "Point" or "Feature" (which contains a

525 "Point") to an `epoint` datum. For any other JSON objects, NULL is returned.

526

527 The second parameter (which defaults to `epoint_lonlat`) may be set to a name

528 of a conversion function that transforms two coordinates (two `float8`

529 parameters) to an `epoint`.

530

531 #### `GeoJSON_to_ecluster(jsonb, text)`

532

533 Maps a (valid) GeoJSON object to an `ecluster`. Note that this function

534 does not check whether the JSONB object is a valid GeoJSON object.

535

536 The second parameter (which defaults to `epoint_lonlat`) may be set to a name

537 of a conversion function that transforms two coordinates (two `float8`

538 parameters) to an `epoint`.

539

540 #### `max_latitude(ebox)`

541

542 Returns the northern boundary of a given `ebox` in degrees between -90 and +90.

543

544 #### `max_longitude(ebox)`

545

546 Returns the eastern boundary of a given `ebox` in degrees between -180 and +180

547 (both inclusive).

548

549 #### `min_latitude(ebox)`

550

551 Returns the southern boundary of a given `ebox` in degrees between -90 and +90.

552

553 #### `min_longitude(ebox)`

554

555 Returns the western boundary of a given `ebox` in degrees between -180 and +180

556 (both inclusive).

557

558 #### `latitude(epoint)`

559

560 Returns the latitude value of an `epoint` in degrees between -90 and +90.

561

562 #### `longitude(epoint)`

563

564 Returns the longitude value of an `epoint` in degrees between -180 and +180

565 (both inclusive).

566

567 #### `radius(ecircle)`

568

569 Returns the radius of an `ecircle` in meters.

570