Distributed Lock in Greenplum

I have been working on locks in Greenplum since first half of 2018. The related projects are: Global Deadlock Detector, lock mode optimization and other minor things. This blog reviews all the lessons that learned.

Background of Greenplum

A running Greenplum cluster contains many running Postgres instance in many machines. Each Postgres instance is called segment. One of segments is called master, client directly communicates with master. The other segments are simply called segments.

Client (psql, jdbc, odbc…) connects to Master’s Postgres process (we call it postmaster), it forks a new worker (we call it QD) to serve the client. Client then send Queries to QD, QD generates the plan, and then creates sever groups of processes all across the whole cluster, and then dispatch the plan to each processes and gathers all the results and finally sends the result back to client.

Catalog is stored in master and each segment (some catalog is not maintained in segment). Data is only stored in segment (not master).

Almost the same as Postgres (single instance). It is helpful to take Greenplum as Postgres with remote executing engine.

Basic knowledge on Locks

There are three different kinds of locks in Greenplum (Postgres):

  • Database object Lock
  • LW Locks
  • SpinLocks

This blog only talks about database object locks, those can be showed in the view pg_locks.

Lock is mechanism for synchronizing concurrent operations. Database object Lock is the most high level lock in Greenplum. There are different types of Database object Lock, some of them are shown below (details please refer the source code src/include/storage/lock.h

For relation lock, there are different lock modes (src/include/storage/lockdefs.h):

#define NoLock			 0

#define AccessShareLock		 1	/* SELECT */
#define RowShareLock	         2	/* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock	 3	/* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4	/* VACUUM (non-FULL),ANALYZE, CREATE
					 * INDEX CONCURRENTLY */
#define ShareLock		 5	/* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock	 6	/* like EXCLUSIVE MODE, but allows ROW
					 * SHARE */
#define ExclusiveLock		 7	/* blocks ROW SHARE/SELECT...FOR
					 * UPDATE */
#define AccessExclusiveLock	 8	/* ALTER TABLE, DROP TABLE, VACUUM
					 * FULL, and unqualified LOCK TABLE */

And the lock conflicts table is can be found here src/backend/storage/lmgr/lock.c

For example, most of alter table statements will hold AccessExclusiveLock on the target table which means any other transaction even only select the same table will be blocked.The reason is that most alter table statements will modify catalog, and the modification of catalog will send to any other transactions to invalidate their catalog snapshot, and catalog is very important which should be consistent during the critical lifetime of a query.

Table lock and transaction lock

Table lock

Queries involve table will first hold lock on the table. Greenplum (Postgres) first locks the table when doing semantic analysis (transform parse tree to get a Query tree). That is the time when adding range table entry. And then after generating the plan, just before executing, in the function InitPlan it will open all the relations using the exact same lock mode. (Latest Postgres has optimized this).

There are two very important things in the above paragraph:

  • The lock mode should be the same. If the lock mode is upgraded when InitPlan, there may be risk for local deadlock.
  • The lock that may lead to waiting should be taken before getting snapshot.

local deadlock example: suppose transaction 1 and transaction 2 both upgrade lock from RowExclusive to Exclusive and they run concurrently, and the time line is:

  • tx1 and tx2 both successfully hold RowExclusiveLock on the same table, RowExclusive mode does not conflict with each other
  • tx1 try to hold ExclusiveLock when InitPlan, it hangs, because tx2 is holding RowExclusiveLock
  • tx2 try to ExclusiveLock when InitPlan, it hangs, because tx1 is holding RowExclusiveLock
  • local deadlock happens and Postgres can resolve such deadlock

lock before snapshot: this is subtle and is closely related to MVCC implementation of Postgres. Why relation lock will be waiting? It is waiting for other transaction holding the lock on the relation to either commit or abort. So when it wakes up, some thing has happened. Suppose the transaction blocking it has committed, this information should be known by the waking up transaction and that is why it needs to take snapshot after holding the lock (updating the in process transactions array in the snapshot).

Transaction lock

MVCC in Greenplum (Postgres) allows reading and writing can be executed concurrently, but not for the case that two transactions both are writing the same tuple. When the read committed isolation level is set, transaction will waiting for the xmax transaction to commit or abort and then continue. Since data is stored in segments (not master), this kind of waiting will only happen in segment.

Global Deadlock Detector

Greenplum is a distributed ACID database. It supports almost the full SQL as Postgres. In Postgres, DMLs (insert, update, delete) hold RowExclusiveLock on the table. But for Greenplum 5 (and or versions before 5), if it holds RowExclusiveLock for Update|Delete statement, then these transactions can concurrently execute on master, and then all dispatch to segments. This will lead to locking tuples on the segments concurrently. It is very dangerous to lock tuples concurrently on segments without any global deadlock detector. So before gpdb5 (and gpdb5 itself) holds ExclusiveLock on the table that Update | Delete on the same table can only be executed serially.

Please read the source code of these functions to get better understanding: parse_clause.c:setTargetTable, parse_caluse.c:parserOpenTable, heapam.c:CdbTryOpenRelation.

For this part, I cannot write better than the docs in the repo: src/backend/utils/gdd/README.md

And the related PR is: https://github.com/greenplum-db/gpdb/pull/4810

Complex update and delete

Update | Delete may involve several tables:

update t from t1 where xxxx;
delete from t using t2 where yyyy;
update t from t1 where c > ALL (select * from t2);
...

Greenplum also supports update on distributed columns. The technique behind is called split-update.

I have discussed in details in the previous blog: Prolog Examples: Serialize DMLs for GreenplumDB. Please refer the blog and the following links for details:

Select statement with locking clause

Select statement with locking clause is common in many test benchmark for OLTP. Postgres will hold RowShareLock and generate a plannode LockRows for executor to lock the tuples. Generally, Greenplum cannot adopt the same behavior:

  • without GDD, we cannot lock tuples in segments concurrently
  • even with GDD, tuples are CTID in the executor, and CTID is only valid in the segment where it is from, but Greenplum is distributed database, tuples may be motioned to other segments (when handling join), how can we lock tuples that from other segments? select * from t1, t2 for update of t1.c, t2.c;

So generally, Greenplum will hold ExclusiveLock on the table in the locking clause no matter for share, for update or another locking clause.

But for some very simple case:

  • only one range table involved
  • no sublink and subquery
  • GDD is enabled

Greenplum can behave just like Postgres. See the PR for details:

There is one thing need to mention: select for update on replicated table. Greenplum 6 introduces an important feature: replicated table. For replicated table, data is complete in every single segment, then for simple select statement, it will choose any single segment to dispatch the plan, which means tuple lock will only be held in one segment. And this may lead to local deadlock.

Suppose two transactions: one (tx1) is executing select for update on replicated table t; the other (tx2) is executing update on this replicated table. If we take tx1 as simple case of select for update:

  • tx1 executing select for update (tx1 only lock tuple on seg0)
  • tx2 executing update (tx2 lock tuple on seg1, seg2, waiting on seg0)
  • tx1 then executing update (tx1 wait on every segment)

Global deadlock happens, though it will be detected by GDD.

Cached Plans

Many application use JDBC to query Greenplum. For such extended protocol queries, select for update need more attention.

I have explained the reason in details in the thread: https://groups.google.com/a/greenplum.org/forum/#!msg/gpdb-dev/ugsZca1qLXU/CtUmzEa7CAAJ

Let me simply paste it here:

Using jdbc driver connect to Greenplum to execute SQL queries will involve the following steps:

  1. send ‘P’ message, and QD will invoke exec_parse_message
  2. send ‘B’ message, and QD will invoke exec_bind_message, which will create portal and invoke protalstart to create gang and dispatch plan
  3. send ‘E’ message, and QD will invoke exec_execute_message, this message may contain one parameter max_rows, if max_rows <=0 it means fetch all, and portalrun will
    use max_rows to fetch

So only after getting ‘E’ message, can QD know client only needs some tuples. And QD has to make sure each fetching uses the same snapshot.

That’s why in exec_bind_message, we set a GPDB-specific flag in portal, and use that flag when portalrun for one_select (make them a cursor).

See the PRs for details: Do not optimize select with locking clause in extend queries. #8565

Locks for Upsert

Upsert feature is introduced in Postgres9.5 and thus Greenplum 7 supports this. For the details of this feature please refer Postgres documents (Doc: Insert on conflict do) or search the Internet.

This kind of statement:

insert into t values (1, 1) on conflict do update set c = 1

in Postgres will be transformed into InsertStmt so its cmdType is also INSERT. However, in the parsing stage, it is still treated as INSERT. INSERT holds RowExclusiveLock in QD, and will be executed concurrently in segments. However, on segments, this statement can become UPDATE, which means it will lock tuples. Recall that

This will lead to locking tuples on the segments concurrently. It is very dangerous to lock tuples concurrently on segments without any global deadlock detector.

So we have to add logic to handle this: When GDD is disabled, treat upset as update.

The related PR is: Fix potential global deadlock for upsert.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.