pgConflux

pgConflux is an extension for PostgreSQL to provide fast access to personalized timelines created by merging event streams from a dynamic set of sources.

Installation

Ensure that the pg_config binary is in your path (shipped with PostgreSQL). Ensure that GNU Make is available (either as make or gmake). Then simply type:

make install

Usage

The extension can be loaded with the following SQL command:

CREATE EXTENSION conflux;

The extension introduces the following three data types:

  • actortime
  • ctxactortime
  • ctxactortime_query

Values of type actortime contain a 32-bit integer which reflects the source of an event (e.g. the author of a posting) and additionally contains a 64-bit integer which should be monotonically increasing with time (e.g. using a SERIAL8). Using an index on values of this data type, it is possible to efficiently retrieve the newest events of a set of sources using PostgreSQL's nearest-neighbor search on GiST indices. An index can be created as follows:

CREATE INDEX posting_actortime_key
ON posting USING gist ((actortime(author_id, post_id)));

If posts are made in a particular context (e.g. discussion group) and these posts shall later be queried for a certain (variable) context only, the type ctxactortime can be used instead, which additionally contains one particular context that is represented as a 64-bit integer.

CREATE INDEX posting_ctxactortime_key
ON posting USING gist ((ctxactortime(context_id, author_id, post_id)));

Using the ctxactortime_query data type, it is possible to efficiently select the newest postings while limiting the result set to certain sources (e.g. authors) and to a certain context. The context is ignored when not present in an index (in case of using the actortime data type). To create query objects, the following functions can be used:

  • ctxactortime_query(int8, int4[], int8 = NULL)
  • ctxactortime_query(int8, int4, int8 = NULL)
  • actortime_query(int4[], int8 = NULL)
  • actortime_query(int4, int8 = NULL)

The first argument to the "ctxactortime_query" fuctions is the context, the second argument is an array of sources (or a single source), and the third argument can be used to impose an additional restriction on the time.

The following query would efficiently select the 10 newest postings from the set of postings created by authors 17, 19, and 105, even if the table contains millions (or billions?) of entries.

SELECT * FROM posting
WHERE actortime(author_id, post_id) <| actortime_query(ARRAY[17,105,19])
ORDER BY actortime(author_id, post_id) <-| actortime_query(ARRAY[17,105,19]) LIMIT 10;

This query would efficiently select the 10 newest postings from the set of postings created by authors 17, 19, and 105, even if the table contains millions (or billions?) of entries.

To only consider entries with a serial ID smaller than a certain value (e.g. 12345), the pstamp_query call can, for example, be written as: actortime_query(ARRAY[17,105,19], 12345)

For a complete example, refer to the README file that is included in the source code archive.

Download

Changes

  • 2020-12-05: Version 1.0
    • Drastically improved performence when many contexts are used (include context in single index instead of using multi-column index with the btree_gist extension)
    • Renamed types and functions (actortime instead of pstamp)
    • Improved robustness regarding (theoretical) non-standard implementations of sscanf (where %n would increase the returned integer by one)
  • 2020-11-30: Version 0.5
    • Allow non-superusers to install extension
    • Added proper database schema support
      • Properly support creating extension in schema
      • Forbid schema relocation (as it would break the extension)
  • 2020-11-16: Version 0.4
    • Fixed memory read error when array of persons (e.g. authors who are being followed) is empty
    • Enforce limit of ±(1e15−1) for timestamps (limitation to less than 53_bits is necessary for correct operation as distances in GiST are represented as double precision float)
    • Permit NULL as second argument to pstamp_query to indicate no time restriction
  • 2020-11-14: Version 0.3
    • Bugfixes regarding text representation of pstamp and pstamp_query data types, including compatibility fixes for PostgreSQL version 12 and higher
    • Bugfix regarding crashes when pstamp_query was stored as TOASTed value
    • Minor code cleanup and improvement of README file
  • 2020-02-09: Version 0.2
    • Changed operators from && and <-> to <| and <-|
    • Operator <| only returns IDs strictly smaller than the value given as second parameter to pstamp_query (instead of values smaller than or equal to the value)
    • Fixed code to compile with PostgreSQL 11 and 12
  • 2018-12-05: Initial release of version 0.1