快速排序性能分析——Analytical Combinatorics方法

多键值排序是把快速排序(Quick Sort)和基数排序(Radix Sort)结合起来的算法。论文为 Fast Algorithms for Sorting and Searching Strings (J. Bentley, R. Sedgewick) 。Greenplum 6里面有实现,本文是我自己按论文搜索资料,一步一步复习、学习、整理的记录。

教授的讲座资料:

Continue reading

离散无噪声信道信道容量——Analytical Combinatorics方法

背景

前两天有同事分享看到信息熵的公式觉得非常巧妙,一下子把我的回忆带回到了15年的前本科时代,那时候学习过一门很有意思的课程信息论。信息论基本上就是祖师爷Claude Shannon一篇无比经典的论文A Mathematical Theory of Communication 安排的明明白白。本科时候我准备尝试去看这个论文,结果卡在第一处后就不了了之了,至今记忆犹新。

Continue reading

The HashSubplan implementation for NOT-IN Sublink in Postgres

NOT-IN expressions with subquery in SQL are notorious but very common. Not only do programmers write such kinds of SQLs but also many BI applications generate the kinds of SQLs. With NULL values, things get much more complex and even much much more complex for MPP databases. In this blog, we focus on single-node Postgres to understand the data structures, semantics, and algorithms of this topic.

I will talk more about GPDB’s LASJ implementation of NOT-IN later.

Continue reading

往事可追:从约瑟夫环问题到FFT算法重排

过河拜码头,上岸数人马。能走(苟)多远走(苟)多远,送死的人来了。

《好家伙》主题曲

从2023的国庆节到现在都龙年春节了,过去几个月,一言难尽。偶尔翻出来了上学期间写的一个小blog,看着那时候中二的文字,也是难以启齿。但也算乐于思考。往事可追,这里回顾这两个很有趣的算法,作为2024年思考的开始。

Continue reading

Worktable as Continuations

Not sure why I came again to this interesting paper One WITH RECURSIVE is Worth Many GOTOs. Daily work on Greenplum makes me think and inspires me to learn knowledge in different areas of computer science. This paper was accepted in the same SIGMOD as Greenplum’s HTAP paper and I have a vague impression of it. Recently three tasks in my work made me peruse the paper:

  1. Fix a customer JIRA on recursive CTE
  2. Answer questions from my colleague on the implementation of KNN in Greenplum (I advise using lateral join with some enhancement of Optimizer)
  3. My own spike on the labels as values skill in Greenplum(Postgres)’s expression valuation

This paper uses recursive CTE, Lateral Join and Union All with filters to implement control structure such as Loop, Condition in PLPGSQL. So now is a good time to start a new tour.

Continue reading

探囊取物——理解,诊断,修复Greenplum中的Opclass和Opfamily相关问题

我已经不止一次有这样的感觉了: 当遇到一个特性类型的bug后,短时间内会频繁遇到本质一样的bug。这次是Greenplum关于OpclassOpfamily的话题。这种模式不是偶然,我有一个猜想来解释:遇到第一个问题的时候,如果能够深入仔细的研究思考,由点及面,然后这部分储备的知识和思维的训练,让你更容易从代码和issue里看出同类问题。

本文介绍Greenplum Database里的OpclassOpfamily相关概念,然后列出Greenplum开源社区仓库 greenplum-db/gpdb 里一些关于这个话题的issue并逐一诊断分析。各小节独立自成一个话题,小节之间可能会显得跳跃。不同的参考资料也将散落在各个小节内部。Issue修复的过程是动态的,我会把在开源社区出现过的东西online session里讨论清楚并公布录屏。

Continue reading

👻幽灵捕手——Dancing Links

缘由

之前一阵子,慕慕妈妈给慕慕买了一个益智小玩具——👻幽灵捕手。

刚刚通了一关的吕小慕

这个游戏的介绍视频可以参考: 幽灵捕手玩法视频。整整一年前我恰好看了Knuth的Dancling Links(参考之前的blog: https://kainwen.com/2022/01/03/dancing-links/ ) ,最近看着吕小慕玩儿这个游戏,顿时觉得,这就是一个可以用Dancling Links解决的Exact Cover Problem。那就让我们来试试看吧。

Continue reading