Recently I found a Bug of Postgres, to fully understand it and fix it, I spend some time on Postgres operator class and operator family concepts. I find it very similar to SML’s module system. So this blog records my thoughts.
SML’s Module system
SML introduces module system to better support large projects. Prof. Dan Friedman’s great book Essentials of Programming Languages also uses a whole chapter to describe module. The reason why we need module is (copied from EOPL):
- We will need a good way to separate the system into relatively self- contained parts, and to document the dependencies between those parts.
- We will need a better way to control the scope and binding of names.
- We will need a way to enforce abstraction boundaries.
- Last, we need a way to combine these parts flexibly, so that a single part may be reused in different contexts.
And functor or module procedure is a great technique to implement generic programming. This blog is not talking about SML, if you are not familiar with SML, please refer the book Programming in Standard ML. Let me finish this section uses an example of Generic BST in SML (from Ullman’s textbook):
signature TOTALORDER = sig
type element
val lt : element * element -> bool
end;
functor MakeBST(Lt:TOTALORDER):
sig
type 'a btree
val create : Lt.element btree;
val lookup : Lt.element * Lt.element btree -> bool
val insert : Lt.element * Lt.element btree
-> Lt.element btree
end
=
struct
open Lt
datatype 'a btree =
Empty
| Node of 'a * 'a btree * 'a btree;
val create = Empty
fun lookup(x, Empty) = false
| lookup(x, Node(v, left, right)) =
if Lt.lt(x, v) then lookup(x, left)
else if Lt.lt(v, x) then lookup(x, right)
else true
fun insert(x, Empty) = Node(x, Empty, Empty)
| insert(x, n as Node(v, left, right)) =
if Lt.lt(x, v) then Node(v, insert(x, left), right)
else if Lt.lt(v, x) then Node(v, right, insert(x, right))
else n
end;
If we want to use the above generic BST, say we want to store and search int, we can use it as the following code:
structure MyInt : TOTALORDER =
struct
type element = int
fun lt(a, b) = a < b
end;
structure MyIntBst = MakeBST(MyInt);
let val bst = MyIntBst.insert(5, MyIntBst.create)
in
MyIntBst.lookup(3, bst)
end
Postgres operator class and operator family
We can create index in Postgres for the columns. A typical index in database is btree index (the core idea is just like the above BST). Postgres supports many data types so the btree index has to be generic. And the technique here is opclass and opfamily. It is very similar to the TOTALORDER
module in the previous sections’ example. Each index type (btree, hash, brin …) is a functor
. It needs to take into another module
(opclass) and then the btree can work well.
The example case to create a operator class (it is copied from https://github.com/greenplum-db/gpdb/issues/6971):
-- For the tests, we define our own equality operator called |=|, which
-- compares the absolute values. For example -1 |=| 1.
-- (from gpdist_opclasses.sql test)
CREATE FUNCTION abseq(int, int) RETURNS BOOL AS $$
begin return abs($1) = abs($2); end;
$$ LANGUAGE plpgsql STRICT IMMUTABLE;
CREATE OPERATOR |=| (
PROCEDURE = abseq,
LEFTARG = int,
RIGHTARG = int,
COMMUTATOR = |=|,
hashes, merges);
CREATE FUNCTION abslt(int, int) RETURNS BOOL AS $$
begin return abs($1) < abs($2); end;
$$ LANGUAGE plpgsql STRICT IMMUTABLE;
CREATE OPERATOR |<| (PROCEDURE = abslt, LEFTARG = int, RIGHTARG = int, hashes);
CREATE FUNCTION absgt(int, int) RETURNS BOOL AS $$
begin return abs($1) > abs($2); end;
$$ LANGUAGE plpgsql STRICT IMMUTABLE;
CREATE OPERATOR |>| (PROCEDURE = absgt, LEFTARG = int, RIGHTARG = int, hashes);
CREATE FUNCTION abscmp(int, int) RETURNS int AS $$
begin return btint4cmp(abs($1),abs($2)); end;
$$ LANGUAGE plpgsql STRICT IMMUTABLE;
CREATE OPERATOR CLASS abs_int_btree_ops FOR TYPE int4
USING btree AS
OPERATOR 1 |<|,
OPERATOR 3 |=|,
OPERATOR 5 |>|,
FUNCTION 1 abscmp(int, int);
SQL queries might consider index scan based on the where clause
and the cost model. Suppose we create index on the column a
with type int4
, many practical queries may write the where clause
like a > const
with int8
type const. This means we might need more than one configuration of operators. How to solve this issue? Here comes opfamily.
Opfamily wraps some operators for a specific index method, based on strategy and operator arg’s type, we can find the wanted operator. In the above create opclass
example, even we do not explicitly create any
So in summary, the logic is:
- to make an index generic, we have to provide an operator class when create index
- an opclass links to an opfamily that contains all the operators supported for this index
- based on the specific filter condition in the
where clause
involving the index, and the types, planner can select the right operator in the opfamily
Postgres Opclass and Opfamily Internals and Examples
catalogs and c functions
Related catalogs are:
pg_opclass
pg_opfamily
pg_amop
For the details, please refer to Postgres documents: https://www.postgresql.org/docs/current/catalogs.html
How they store the information can be shown below:
zlv=# create table tab(a int, b int);
CREATE TABLE
zlv=# create index idx_ta on tab(a int4_ops);
CREATE INDEX
zlv=# select * from pg_opclass where opcname = 'int4_ops';
oid | opcmethod | opcname | opcnamespace | opcowner | opcfamily | opcintype | opcdefault | opckeytype
-------+-----------+----------+--------------+----------+-----------+-----------+------------+------------
1978 | 403 | int4_ops | 11 | 10 | 1976 | 23 | t | 0
10020 | 405 | int4_ops | 11 | 10 | 1977 | 23 | t | 0
(2 rows)
zlv=# select * from pg_am;
oid | amname | amhandler | amtype
------+--------+-------------+--------
403 | btree | bthandler | i
405 | hash | hashhandler | i
783 | gist | gisthandler | i
2742 | gin | ginhandler | i
4000 | spgist | spghandler | i
3580 | brin | brinhandler | i
(6 rows)
zlv=# select * from pg_opfamily where oid = 1976;
oid | opfmethod | opfname | opfnamespace | opfowner
------+-----------+-------------+--------------+----------
1976 | 403 | integer_ops | 11 | 10
(1 row)
zlv=# select * from pg_amop where amopfamily = 1976;
oid | amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose | amopopr | amopmethod | amopsortfamily
-------+------------+--------------+---------------+--------------+-------------+---------+------------+----------------
10118 | 1976 | 21 | 21 | 1 | s | 95 | 403 | 0
10119 | 1976 | 21 | 21 | 2 | s | 522 | 403 | 0
10120 | 1976 | 21 | 21 | 3 | s | 94 | 403 | 0
10121 | 1976 | 21 | 21 | 4 | s | 524 | 403 | 0
10122 | 1976 | 21 | 21 | 5 | s | 520 | 403 | 0
10123 | 1976 | 21 | 23 | 1 | s | 534 | 403 | 0
10124 | 1976 | 21 | 23 | 2 | s | 540 | 403 | 0
10125 | 1976 | 21 | 23 | 3 | s | 532 | 403 | 0
10126 | 1976 | 21 | 23 | 4 | s | 542 | 403 | 0
10127 | 1976 | 21 | 23 | 5 | s | 536 | 403 | 0
10128 | 1976 | 21 | 20 | 1 | s | 1864 | 403 | 0
10129 | 1976 | 21 | 20 | 2 | s | 1866 | 403 | 0
Kernel functions
Postgres kernel code to fetch the operator is shown below (it is copied from the function create_indexscan_plan
):
PathKey *pathkey = (PathKey *) lfirst(pathkeyCell);
Node *expr = (Node *) lfirst(exprCell);
Oid exprtype = exprType(expr);
Oid sortop;
/* Get sort operator from opfamily */
sortop = get_opfamily_member(pathkey->pk_opfamily,
exprtype,
exprtype,
pathkey->pk_strategy);
Examples
Let’s now play with the planner.
zlv=# create table tab(a int);
CREATE TABLE
zlv=# create index myidx on tab using btree (a int4_ops);
CREATE INDEX
zlv=# explain (costs off) select * from tab where a > 1;
QUERY PLAN
----------------------------------
Bitmap Heap Scan on tab
Recheck Cond: (a > 1)
-> Bitmap Index Scan on myidx
Index Cond: (a > 1)
(4 rows)
zlv=# explain (costs off) select * from tab where a > 1::int8;
QUERY PLAN
---------------------------------------
Bitmap Heap Scan on tab
Recheck Cond: (a > '1'::bigint)
-> Bitmap Index Scan on myidx
Index Cond: (a > '1'::bigint)
(4 rows)
zlv=# explain (costs off) select * from tab where a > 1::float;
QUERY PLAN
-----------------------------------------------------------
Seq Scan on tab
Filter: ((a)::double precision > '1'::double precision)
(2 rows)
opclass int4_ops
belongs to the opfamily integer_ops
.
zlv=# select * from pg_opclass where opcname = 'int4_ops' and opcmethod = (select oid from pg_am where amname = 'btree');
oid | opcmethod | opcname | opcnamespace | opcowner | opcfamily | opcintype | opcdefault | opckeytype
------+-----------+----------+--------------+----------+-----------+-----------+------------+------------
1978 | 403 | int4_ops | 11 | 10 | 1976 | 23 | t | 0
(1 row)
zlv=# select * from pg_opfamily where oid = 1976;
oid | opfmethod | opfname | opfnamespace | opfowner
------+-----------+-------------+--------------+----------
1976 | 403 | integer_ops | 11 | 10
And in this opfamily, there’s no operator that handles float type argument:
zlv=# select * from pg_amop where amopfamily = 1976 and ((amoplefttype in (select oid from pg_type where typname like 'float%')) or (amoprighttype in (select oid from pg_type where typname like 'float%')));
oid | amopfamily | amoplefttype | amoprighttype | amopstrategy | amoppurpose | amopopr | amopmethod | amopsortfamily
-----+------------+--------------+---------------+--------------+-------------+---------+------------+----------------
(0 rows)
So after casting the integer type to float then, Postgres cannot generate an index scan path. But I think we might do better for such cases, like can we add a rule that a::int > b::float <--> a::int > ceil(b)::int
, and then take index scan into consideration.
Greenplum distkey
Greenplum is an opensource MPP database based on Postgres. Users can choose the table’s distkey in Greenplum to better fit their daily jobs.
To decide the segment id of a tuple of some hash distributed table in Greenplum contains two steps:
- map the tuple to a binary string
- map the binary string to segment id
In Greenplum 6 and later version, Heikki has refactored the distkey implementation thus the first step in the above is quite different than before (the PR is: Use normal hash operator classes for data distribution. #6811). After then, Greenplum’s hash distributed policy can be thought of some kind of hash index. The catalog gp_distribution_policy
now stores the opclass for each distributed column.
References
Postgres documents:
- https://www.postgresql.org/docs/current/xindex.html
- https://www.postgresql.org/docs/current/sql-createoperator.html
- https://www.postgresql.org/docs/current/sql-createopclass.html
- https://www.postgresql.org/docs/current/indexes-opclass.html
SML module tutorial:
https://courses.cs.washington.edu/courses/cse341/04wi/lectures/09-ml-modules.html
EOPL chapter 8: