Some background knowledge
Greenplum is MPP database. The SQL query’s planning and executing model is a distributed computing model. Greenplum introduces the concept of locus to model a path’s data distribution.
- Function scan of generate_series is General locus (because the information is complete at every single place, but in Greenplum 5 this function is not treated as General, this may be the historical reason)
- the scan path of a replicated table is segmenteGeneral locus (because data only stored in segments, not master, and the information is complete at every single segment)
- ……
For more details, please refer to one of my talk Greenplum 查询优化器关键技术介绍.
The issue of general and segmentGeneral path with volatile functions
General and segmentGeneral locus imply that if the corresponding slice is executed in many different segments should provide the same result data set. Thus, in some cases, General and segmentGeneral can be treated like broadcast.
But what if the segmentGeneral and general locus path contain volatile functions? Volatile functions, by definition, do not guarantee the results of different invokes. So for such cases, they lose the property and cannot be treated as broadcast. Previously, Greenplum planner (also orca) does not handle these cases correctly.
Let’s see some examples of the wrong plans because of this kind of bug, for the complete list, please refer to the Gist: https://gist.github.com/kainwen/ad43f4ae88dc56f3c6eb73056b195e8b
Case 1: random() in a replicated table’s restriction clause, the following plan will execute the scan of replicated table everywhere, and thus the plan is not equivalent to the Postgres one.
explain (costs off, verbose) select * from t_rep, t_hashdist where t_rep.a > random();
QUERY PLAN
-------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: t_rep.a, t_rep.b, t_rep.c, t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Nested Loop
Output: t_rep.a, t_rep.b, t_rep.c, t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Seq Scan on public.t_hashdist
Output: t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Materialize
Output: t_rep.a, t_rep.b, t_rep.c
-> Seq Scan on public.t_rep
Output: t_rep.a, t_rep.b, t_rep.c
Filter: ((t_rep.a)::double precision > random())
Case 2: random() is in the targetlist of a replicated table’s scan which also execute everywhere and break the property of “broadcast-like”:
explain (costs off, verbose) select * from t_hashdist where a in (select random()::int from t_rep);
QUERY PLAN
----------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Hash Join
Output: t_hashdist.a, t_hashdist.b, t_hashdist.c
Hash Cond: (t_hashdist.a = ((random())::integer))
-> Seq Scan on public.t_hashdist
Output: t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Hash
Output: ((random())::integer)
-> HashAggregate
Output: ((random())::integer)
Group Key: (random())::integer
-> Seq Scan on public.t_rep
Output: (random())::integer
After reading all the wrong plans, we should understand what the bug is. To emphasize it again: the root cause is that if a general or segmentGeneral locus paths contain volatile functions, they cannot be treated as general or segmentGeneral.
My idea to fix this issue
The idea is: when we find the pattern (a general or segmentGeneral locus paths contain volatile functions), we create a motion path above it to turn its locus to singleQE and then create a projection path. Then the core job becomes how we choose the right places to check:
- For a single base rel, we should only check its restriction, this is the at bottom of planner, this is at the function
set_rel_pathlist
- When creating a join path, if the join locus is general or segmentGeneral, check its joinqual to see if it contains volatile functions
- When handling subquery, we will invoke
set_subquery_pathlist
function, at the end of this function, check the targetlist and havingQual - When creating limit path, the check and change algorithm should also be used
- Correctly handle
make_subplan
OrderBy clause and Group Clause should be included in targetlist and handled by the above Step 3.
I have open a PR (with detailed commit message, comments and test cases): Correct plan of general & segmentGeneral path with volatiole functions. #10418. Please refer to the code change there.
DMLs on replicated table
Update & Delete Statement on a replicated table is special. These statements have to be dispatched to each segment to execute. So if they contain volatile functions in their targetList or where clause, we should reject such statements:
- For targetList, we check it at the function
create_motion_path_for_upddel
- For where clause, they will be handled in the query planner and if we find the pattern and want to fix it, do another check if we are updating or deleting replicated table, if so reject the statement.
For insert statement, if we have correct behavior on planner, it is OK automatically. Refer to the cases in the above gist link and the following gist link. Also, for insert we have to check the upsert related clause in the statement.
The correct plans after fixing
The cases that matched the above wrong ones are listed below. For the whole list, please refer to https://gist.github.com/kainwen/cd96c8fc35b68c9d87d73af7f269ba99
Case 1: A motion is added to broadcast the replicated table scan result.
explain (costs off, verbose) select * from t_rep, t_hashdist where t_rep.a > random();
QUERY PLAN
-------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: t_rep.a, t_rep.b, t_rep.c, t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Nested Loop
Output: t_rep.a, t_rep.b, t_rep.c, t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Seq Scan on public.t_hashdist
Output: t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Materialize
Output: t_rep.a, t_rep.b, t_rep.c
-> Broadcast Motion 1:3 (slice2; segments: 1)
Output: t_rep.a, t_rep.b, t_rep.c
-> Result
Output: t_rep.a, t_rep.b, t_rep.c
-> Seq Scan on public.t_rep
Output: t_rep.a, t_rep.b, t_rep.c
Filter: ((t_rep.a)::double precision > random())
Case 2: A motion is added to broadcast the replicated table scan result.
explain (costs off, verbose) select * from t_hashdist where a in (select random()::int from t_rep);
QUERY PLAN
------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Hash Semi Join
Output: t_hashdist.a, t_hashdist.b, t_hashdist.c
Hash Cond: (t_hashdist.a = ((random())::integer))
-> Seq Scan on public.t_hashdist
Output: t_hashdist.a, t_hashdist.b, t_hashdist.c
-> Hash
Output: ((random())::integer)
-> Redistribute Motion 1:3 (slice2; segments: 1)
Output: ((random())::integer)
Hash Key: ((random())::integer)
-> Result
Output: ((random())::integer)
-> Seq Scan on public.t_rep
Output: (random())::integer
Related Resources
- Github Issue: General or SegmentGeneral locus path contain volatile functions cannot be treated as *general. #10419
- Greenplum development mailing list thread: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/354vF_0EPIM
- The diff of the plan before and after the patch can be found here: https://www.diffnow.com/report/jeupd