不踬于山——追踪内存泄漏

不踬于山,而踬于垤。—— 先秦圣谚

最近在Greenplum 6X上发现了一个内存泄漏,Memory Leak DynamicBitmap Index Scan in Nestloop’s inner plan #14122, 查询和查询计划与和之前修复过的如出一辙,所以肯定是来了新的泄漏。这一次可以复现问题,能复现的bug,修复只是时间问题,我的同事此刻已经搞定了,开出了Pull Request (见Issue页面)。本文是我做事后诸葛亮,看看能否用一些工具快速定位这个问题。

Continue reading

On pg_relation_size in Greenplum

Recently I have been working on several issues involving relation_size in Greenplum, this post is to summarize some interesting parts of the topic. I also post a thread on gpdb-dev mailing list: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/DkTx4O-kuH0.

The problem

Q: there is a table storing some other tables’ oid, how to write a  SQL to compute table size for each oid?
This problem is so simple in Postgres (single instance), but it turns much more complex than the first glance to work this out high efficiently in Greenplum.

Continue reading

Squelch Logic in Greenplum

Recently I have reviewed a PR from my colleague: Fix crash on QD when squelching CTE producer with a Motion child. This PR from Alexandra Wang (a very smart engineer from Greenplum Team) is well documented in the PR message and has detailed comments, also contains carefully designed test cases. Readers can feel the style of how Greenplum’s open-source community works. Next, let’s come back to the technical details of Greenplum’s Squelch logic (All best ideas are from Alex).

Continue reading

How to hack Greenplum & Postgres?

背景

这篇主要写个中文社区里希望上手开发的工程师或者大学生朋友们,期待他们可以快速上手Postgres或者Greenplum的开发和调试。我本人2017年7月入职Pivotal从事Greenplum开发,实话实话,上手过程非常痛苦,可能跟一开始接触的项目有关,大概得有将近一年才初窥门径。回顾起来,怎么看都太慢了。

尔后,疫情期间,我指导曾经助教的学生Hack Postgres | Greenplum,用结对编程的模式,zoom录屏的方式,他极快的上手了开发。并且短时间内就给Greenplum提交了两个Patch,给Postgres提交了一个Patch(PostgreSQL v13 release的致谢名单里就有他的名字)。再后来,我又把资料给一个大龄程序员,他诊断和阅读代码的能力不错,故而也很快上手,给Greenplum提交了若干Patch并且发现了一个比较重要的方向。这些Patch都并不是Naive的Patch,都是很有用并且涉及知识点比较广的问题。还有就是今年带的一个实习生,也是大概2-3周就给Greenplum提交了一个Patch,使得Greenplum 7的OLTP性能飙升(当然不是完全属于他的独立的工作)。

上述经历和尝试,证明了,只要有扎实C语言功底,就可以快速上手。其他的遇到了再学,各个深入即可。

Continue reading

Volatile Functions in Query on Broadcast-like Relation in Greenplum

Some background knowledge

Greenplum is MPP database. The SQL query’s planning and executing model is a distributed computing model. Greenplum introduces the concept of locus to model a path’s data distribution.

  • Function scan of generate_series is General locus (because the information is complete  at every single place, but in Greenplum 5 this function is not treated as General, this may be the historical reason)
  • the scan path of a replicated table is segmenteGeneral locus (because data only stored in segments, not master, and the information is complete at every single segment)
  • ……

For more details, please refer to one of my talk Greenplum 查询优化器关键技术介绍.

Continue reading

Tour of analyzing of a bug of Greenplum: Locus, Partial Distribution, Motion and Interconnect

Overview

The bad plan and wrong result

When reviewing the code of the PR Improve Append plan when General children exist #9692, I have come to this line of code in the function set_append_path_locus:

/*
 * CDB: If all the scans are distributed alike, set
 * the result locus to match.  Otherwise, if all are partitioned,
 * set it to strewn.  A mixture of partitioned and non-partitioned
 * scans should not occur after above correction;
 *
 * CDB TODO: When the scans are not all partitioned alike, and the
 * result is joined with another rel, consider pushing the join
 * below the Append so that child tables that are properly
 * distributed can be joined in place.
 */
if (isfirst)
{
	targetlocus = projectedlocus;
	isfirst = false;
}
else if (cdbpathlocus_equal(targetlocus, projectedlocus))
{
	/* compatible */
}
else
{
	/*
	 * subpaths have different distributed policy, mark it as random
	 * distributed and set the numsegments to the maximum of all
	 * subpaths to not missing any tuples.
	 */
	CdbPathLocus_MakeStrewn(&targetlocus,
							Max(CdbPathLocus_NumSegments(targetlocus),
								CdbPathLocus_NumSegments(projectedlocus)));
	break;
}
Continue reading