Worktable as Continuations

Not sure why I came again to this interesting paper One WITH RECURSIVE is Worth Many GOTOs. Daily work on Greenplum makes me think and inspires me to learn knowledge in different areas of computer science. This paper was accepted in the same SIGMOD as Greenplum’s HTAP paper and I have a vague impression of it. Recently three tasks in my work made me peruse the paper:

  1. Fix a customer JIRA on recursive CTE
  2. Answer questions from my colleague on the implementation of KNN in Greenplum (I advise using lateral join with some enhancement of Optimizer)
  3. My own spike on the labels as values skill in Greenplum(Postgres)’s expression valuation

This paper uses recursive CTE, Lateral Join and Union All with filters to implement control structure such as Loop, Condition in PLPGSQL. So now is a good time to start a new tour.

Continue reading

探囊取物——理解,诊断,修复Greenplum中的Opclass和Opfamily相关问题

我已经不止一次有这样的感觉了: 当遇到一个特性类型的bug后,短时间内会频繁遇到本质一样的bug。这次是Greenplum关于OpclassOpfamily的话题。这种模式不是偶然,我有一个猜想来解释:遇到第一个问题的时候,如果能够深入仔细的研究思考,由点及面,然后这部分储备的知识和思维的训练,让你更容易从代码和issue里看出同类问题。

本文介绍Greenplum Database里的OpclassOpfamily相关概念,然后列出Greenplum开源社区仓库 greenplum-db/gpdb 里一些关于这个话题的issue并逐一诊断分析。各小节独立自成一个话题,小节之间可能会显得跳跃。不同的参考资料也将散落在各个小节内部。Issue修复的过程是动态的,我会把在开源社区出现过的东西online session里讨论清楚并公布录屏。

Continue reading

Squelch Logic in Greenplum

Recently I have reviewed a PR from my colleague: Fix crash on QD when squelching CTE producer with a Motion child. This PR from Alexandra Wang (a very smart engineer from Greenplum Team) is well documented in the PR message and has detailed comments, also contains carefully designed test cases. Readers can feel the style of how Greenplum’s open-source community works. Next, let’s come back to the technical details of Greenplum’s Squelch logic (All best ideas are from Alex).

Continue reading

Volatile Functions in Query on Broadcast-like Relation in Greenplum

Some background knowledge

Greenplum is MPP database. The SQL query’s planning and executing model is a distributed computing model. Greenplum introduces the concept of locus to model a path’s data distribution.

  • Function scan of generate_series is General locus (because the information is complete  at every single place, but in Greenplum 5 this function is not treated as General, this may be the historical reason)
  • the scan path of a replicated table is segmenteGeneral locus (because data only stored in segments, not master, and the information is complete at every single segment)
  • ……

For more details, please refer to one of my talk Greenplum 查询优化器关键技术介绍.

Continue reading

CM (count min) is not MC (min count)

Papers and stories

This month, my most spare time has been spent on the problem Cardinality Estimate. When I finished the analysis of hyperloglog (see the blog A Tour of Understanding Hyperloglog), I just remember a recent paper of SIGMOD2019: Approximate Distinct Counts for Billions of Datasets. Its idea is to combine CountMin and Hyperloglog. Before diving into that paper, I decide to study CountMin. Another paper on cardinality estimate survey is VLDB2017’s Cardinality Estimation: An Experimental Survey mentioned a method called MinCount.

CM sketch was introduced by the paper: An Improved Data Stream Summary: The Count-Min Sketch and its Applications. I do not want to dive into that paper, but just reword the proof here (maybe in a better way).

Continue reading

Tour of analyzing of a bug of Greenplum: Locus, Partial Distribution, Motion and Interconnect

Overview

The bad plan and wrong result

When reviewing the code of the PR Improve Append plan when General children exist #9692, I have come to this line of code in the function set_append_path_locus:

/*
 * CDB: If all the scans are distributed alike, set
 * the result locus to match.  Otherwise, if all are partitioned,
 * set it to strewn.  A mixture of partitioned and non-partitioned
 * scans should not occur after above correction;
 *
 * CDB TODO: When the scans are not all partitioned alike, and the
 * result is joined with another rel, consider pushing the join
 * below the Append so that child tables that are properly
 * distributed can be joined in place.
 */
if (isfirst)
{
	targetlocus = projectedlocus;
	isfirst = false;
}
else if (cdbpathlocus_equal(targetlocus, projectedlocus))
{
	/* compatible */
}
else
{
	/*
	 * subpaths have different distributed policy, mark it as random
	 * distributed and set the numsegments to the maximum of all
	 * subpaths to not missing any tuples.
	 */
	CdbPathLocus_MakeStrewn(&targetlocus,
							Max(CdbPathLocus_NumSegments(targetlocus),
								CdbPathLocus_NumSegments(projectedlocus)));
	break;
}
Continue reading

三国赵云传数据+Postgres解析一篇SIGMOD2019论文的agg递归CTE

标题党实锤了。这也是我博客中第一篇用中文写的技术文章。坚持用英文写博客,并不是崇洋媚外,只是为了跟各国同事之间交流技术方便一些。那么这次为啥选择母语,因为原始论文里的例子一点也不好玩,我选择三国赵云传的一个数据来作例子,那不妨就写得幽默一代呢,放飞自我放飞的根本一些。

论文是这篇:  RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark, SIGMOD 19.

Continue reading