我时不时会浏览一些大佬的个人主页(如Knuth和Sedgewick),好久好久之前我就看到了Sedgewick关于HyperBitBit的研究了,一直就等他论文发表了。后来才想,不会这大爷忘了更新主页吧,去搜其他合作者,果然,论文刚刚都发布了。本文先罗列资料,然后再贴我的分析笔记。
- 论文代码仓库:https://github.com/robert-sedgewick/hyperbitbit
- Sedgewick教授个人主页关于基数估计的部分:https://sedgewick.io/research/#cardinality
- 2022年Sedgewick关于基数估计的一个talk: https://www.birs.ca/events/2022/5-day-workshops/22w5004/videos/watch/202211140856-Sedgewick.html
- 上面talk的slide:https://www.birs.ca/workshops/2022/22w5004/files/Bob%20Sedgewick/HyperBit.pdf
- Paper: Janson, Svante, Jérémie Lumbroso, and Robert Sedgewick. “Bit-Array-Based Alternatives to HyperLogLog.” 35th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms (AofA 2024). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2024.
- Paper Link: https://www2.math.uu.se/~svantejs/papers/sj383-aofa.pdf
- Sedgewick教授个人主页里的talk的 https://sedgewick.io/talks/
- LogLog Counting of Large Cardinalities Marianne Durand and Philippe Flajolet
- Cardinality Estimation
这次的分析清参考下面的pdf (也可以从这里下载: https://github.com/kainwen/misc/blob/main/A_Tour_of_Understanding_HyperBit_.pdf):