Prolog Examples: Serialize DMLs for GreenplumDB

Contents hide

BackGround

Greenplum DB is an open-source massively parallel data platform for analytics, machine learning and AI.

Greenplum DB 6.0 is close to release and in GPDB6, Global Deadlock Detector (GDD) is introduced so that DMLS (insert, update, delete) on the same table can be executed concurrently. According to a performance test report by Alibaba:

Alibaba in China has done some initial testing on OSS GP in preperation for Greenplum 6. Here is the test result:
TEST CASE: Each transaction 3 updates, 1 select, 1 insert
* Greenplum 5 , Transactions per Second: 15
* Greenplum 6, Transactions per Second: 2900

However, some DMLs cannot be executed concurrently under current MPP architecture (we might improve this in future). We have to serialize these DMLs execution on QD.

Detailed Discussion

Please refer the gpdb-dev mail-list thread: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/tvI-VQZbt

Lock-Level Configuration Problem

This post focuses on how to config the lock-level to achieve the best and correct concurrence schedule.

Insert Statement is always OK then we can divide the other DMLs into five classes:

c1: normal update (update statement whose plan contains no Motion)
c2: normal delete (delete statement whose plan contains no Motion)
c3: split-update (update on dist-keys of hash-distributed table)
c4: update statement whose plan contains Motions
c5: delete statement whose plan contains Motions

Let’s consider the conflict relations of these 5 classes of DMLS, in the following table C means the two classes of DMLs can be executed concurrently on QD, S means the two classes of DMLs cannot be executed concurrently on QD (suppose they operates on the same table):

	c1	c2	c3	c4	c5
c1	C	C	S	S	S
c2	C	C	S	C	C
c3	S	S	S	S	S
c4	S	C	S	S	S
c5	S	C	S	S	C

And in fact, the above relation table can be deduced by prolog, and the core logic is quite simple, only two rules:

conflict(A, split_update) :-
    hold_xid_lock(A).

conflict(A, B) :-
    mark_tuple_heap_updated(A),
    has_motion(B).

The whole prolog program is:

/*
 * Not all DMLs can execute concurrently in MPP.
 * The problem is
 *   - split-update
 *   - EvalPlanQual in QD
 *
 * DMLs excluding Insert can be divided 5 groups:
 *   1. naive-update (update a table's non-distkey and the plan contains no motion)
 *   2. naive-delete (the plan contains no motion)
 *   3. split-update (update a table's distkey)
 *   4. update-with-motion
 *   5. delete-with-motion
 */

mark_tuple_heap_updated(naive_update).
mark_tuple_heap_updated(update_with_motion).

hold_xid_lock(naive_update).
hold_xid_lock(naive_delete).
hold_xid_lock(split_update).
hold_xid_lock(update_with_motion).
hold_xid_lock(delete_with_motion).

has_motion(split_update).
has_motion(update_with_motion).
has_motion(delete_with_motion).

conflict(A, split_update) :-
    hold_xid_lock(A).

conflict(A, B) :-
    mark_tuple_heap_updated(A),
    has_motion(B).

enlarge_list([], []) :- nl.
enlarge_list([(A, B)|List], R) :-
    enlarge_list(List, R1),
    list_to_set([(A, B), (B, A)|R1], R).

print_list([]) :- nil.
print_list([E]) :- print(E), nl.
print_list([E|Es]) :- print(E), nl, print_list(Es).

Run the above problem, the result is shown as following:

And Postgres’ lock-level conflict relations are:

#define NoLock					0

#define AccessShareLock			1		/* SELECT */
#define RowShareLock			2		/* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock		3		/* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4		/* VACUUM (non-FULL),ANALYZE, CREATE
										 * INDEX CONCURRENTLY */
#define ShareLock				5		/* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock	6		/* like EXCLUSIVE MODE, but allows ROW
										 * SHARE */
#define ExclusiveLock			7		/* blocks ROW SHARE/SELECT...FOR
										 * UPDATE */
#define AccessExclusiveLock		8		/* ALTER TABLE, DROP TABLE, VACUUM
										 * FULL, and unqualified LOCK TABLE */

/*
 * Data structures defining the semantics of the standard lock methods.
 *
 * The conflict table defines the semantics of the various lock modes.
 */
static const LOCKMASK LockConflicts[] = {
	0,

	/* AccessShareLock */
	(1 << AccessExclusiveLock),

	/* RowShareLock */
	(1 << ExclusiveLock) | (1 << AccessExclusiveLock),

	/* RowExclusiveLock */
	(1 << ShareLock) | (1 << ShareRowExclusiveLock) |
	(1 << ExclusiveLock) | (1 << AccessExclusiveLock),

	/* ShareUpdateExclusiveLock */
	(1 << ShareUpdateExclusiveLock) |
	(1 << ShareLock) | (1 << ShareRowExclusiveLock) |
	(1 << ExclusiveLock) | (1 << AccessExclusiveLock),

	/* ShareLock */
	(1 << RowExclusiveLock) | (1 << ShareUpdateExclusiveLock) |
	(1 << ShareRowExclusiveLock) |
	(1 << ExclusiveLock) | (1 << AccessExclusiveLock),

	/* ShareRowExclusiveLock */
	(1 << RowExclusiveLock) | (1 << ShareUpdateExclusiveLock) |
	(1 << ShareLock) | (1 << ShareRowExclusiveLock) |
	(1 << ExclusiveLock) | (1 << AccessExclusiveLock),

	/* ExclusiveLock */
	(1 << RowShareLock) |
	(1 << RowExclusiveLock) | (1 << ShareUpdateExclusiveLock) |
	(1 << ShareLock) | (1 << ShareRowExclusiveLock) |
	(1 << ExclusiveLock) | (1 << AccessExclusiveLock),

	/* AccessExclusiveLock */
	(1 << AccessShareLock) | (1 << RowShareLock) |
	(1 << RowExclusiveLock) | (1 << ShareUpdateExclusiveLock) |
	(1 << ShareLock) | (1 << ShareRowExclusiveLock) |
	(1 << ExclusiveLock) | (1 << AccessExclusiveLock)

};

So the lock-level for each class can be solved easily using Prolog:

conflict(2, 7).
conflict(2, 8).
conflict(7, 2).
conflict(7, 3).
conflict(7, 4).
conflict(7, 5).
conflict(7, 6).
conflict(7, 7).
conflict(7, 8).
conflict(1, 8).
conflict(3, 5).
conflict(3, 6).
conflict(3, 7).
conflict(3, 8).
conflict(6, 3).
conflict(6, 4).
conflict(6, 5).
conflict(6, 6).
conflict(6, 7).
conflict(6, 8).
conflict(4, 4).
conflict(4, 5).
conflict(4, 6).
conflict(4, 7).
conflict(4, 8).
conflict(8, 1).
conflict(8, 2).
conflict(8, 3).
conflict(8, 4).
conflict(8, 5).
conflict(8, 6).
conflict(8, 7).
conflict(8, 8).
conflict(5, 3).
conflict(5, 4).
conflict(5, 6).
conflict(5, 7).
conflict(5, 8).


compatible(2, 2).
compatible(2, 6).
compatible(2, 1).
compatible(2, 3).
compatible(2, 5).
compatible(2, 4).
compatible(7, 1).
compatible(1, 2).
compatible(1, 7).
compatible(1, 1).
compatible(1, 3).
compatible(1, 6).
compatible(1, 5).
compatible(1, 4).
compatible(3, 3).
compatible(3, 2).
compatible(3, 1).
compatible(3, 4).
compatible(6, 2).
compatible(6, 1).
compatible(4, 3).
compatible(4, 2).
compatible(4, 1).
compatible(5, 2).
compatible(5, 1).
compatible(5, 5).

virtualdml_conflict(X1, X2, X3, X4, X5) :-
    member(X1, [1,2,3,4,5,6,7]),
    member(X2, [1,2,3,4,5,6,7]),
    member(X3, [1,2,3,4,5,6,7]),
    member(X4, [1,2,3,4,5,6,7]),
    member(X5, [1,2,3,4,5,6,7]),
    conflict(X1, X3),
    conflict(X1, X4),
    conflict(X1, X5),

    conflict(X2, X3),

    conflict(X3, X1),
    conflict(X3, X2),
    conflict(X3, X3),
    conflict(X3, X4),
    conflict(X3, X5),

    conflict(X4, X1),
    conflict(X4, X3),
    conflict(X4, X4),
    conflict(X4, X5),

    conflict(X5, X1),
    conflict(X5, X3),
    conflict(X5, X4).

virtualdml_compatible(X1, X2, X3, X4, X5) :-
    member(X1, [1,2,3,4,5,6,7]),
    member(X2, [1,2,3,4,5,6,7]),
    member(X3, [1,2,3,4,5,6,7]),
    member(X4, [1,2,3,4,5,6,7]),
    member(X5, [1,2,3,4,5,6,7]),
    compatible(X1, X1),
    compatible(X1, X2),

    compatible(X2, X1),
    compatible(X2, X2),
    compatible(X2, X4),
    compatible(X2, X5),

    compatible(X4, X2),

    compatible(X5, X2),
    compatible(X5, X5).

Run this prolog problem it tells us that we can use the following lock-levels for each class:

c1 —- RowExclusiveLock
c2 —- RowShareLock
c3 —- ExclusiveLock
c4 —- ShareRowExclusiveLock
c5 —- ShareLock

Thoughts on Computing

英雄无岁，江湖无辈

Prolog Examples: Serialize DMLs for GreenplumDB

BackGround

Detailed Discussion

Lock-Level Configuration Problem

Leave a ReplyCancel reply

BackGround

Detailed Discussion

Lock-Level Configuration Problem

Leave a ReplyCancel reply

Discover more from Thoughts on Computing