简单聊聊PostgreSQL buffer与OS cache

Posted 2021-08-29 PostgreSQLChina

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了简单聊聊PostgreSQL buffer与OS cache相关的知识，希望对你有一定的参考价值。

作者：吴聪

0、概述

缓存可以说是数据库中相当重要的一部分，很多性能相关的问题都与之息息相关。那么我们今天就聊聊在PostgreSQL中的缓存。

1、为什么需要缓存？

在数据库中似乎我们最关心的是磁盘IO，经常会听到数据库IO存在瓶颈之类的问题。这也是为什么在数据库中我们需要缓存。

打个比方，对于计算机而言，1个CPU周期是0.3ns，如果把这个时间当成是我们平时生活中的1s。那么固态硬盘1次IO的时间大约是50-150μs，这意味着什么呢？相当于我们的2到6天。

这么一对比就可想而知，哪怕你的磁盘IO再快，相较于在内存中的访问速度可谓是天差地别。更何况对于我们大部分OLTP的系统来说，大部分都是磁盘的随机IO，性能就要更差了。这也是为什么我们需要将数据缓存到内存中去进行使用。

2、缓存哪些内容？

对于PostgreSQL而言，缓存的内容可以分为以下几种：

表：就是我们的表中实际存储的数据。
索引：索引和表的数据也是一样，通过8K（默认）的page构成，且存放在同样的位置。
执行计划：SQL的执行计划也可以存储在缓存中，我们可以通过pg_prepared_statements插件去查看。

3、PostgreSQL缓存管理的结构

在PostgreSQL中，缓存大致分为3层结构，大致结构如下
（详细可以查看：https://www.interdb.jp/pg/pgsql08.html）：

buffer table:
要访问的数据 page tag(文件ID,block number,fork number), 以及它对应的buffer_id(类似array的下标, 用于buffer descriptors, buffer pool的寻址)
buffer table 的slot数可能比buffer_ids 数少, 而且有hash冲突存在, 所以一个buffer table slot内可能存储多个tag
如果要修改buffer table slot内的内容, 需要加bufmappinglock exclusive lock, 注意这个是锁整个buffer table, 为了提高吞吐, PG 把buffer table分成了几个partition, 修改一个buffer table slot时, 锁这个slot对应的partition(也就是说, 同一个partition内的slot只能串行修改, 但是也大幅度减少了修改buffer table 的冲突)

buffer descriptors：
buffer_id、对应buffer的状态
shared buffer中的每个block对应一个buffer_id, ID用于寻址
对应buffer page的状态
tag holds the buffer_tag of the stored page in the corresponding buffer pool slot (buffer tag is defined in Section 8.1.2).
buffer_id identifies the descriptor (equivalent to the buffer_id of the corresponding buffer pool slot).
refcount holds the number of PostgreSQL processes currently accessing the associated stored page. It is also referred to as pin count. When a PostgreSQL process accesses the stored page, its refcount must be incremented by 1 (refcount++). After accessing the page, its refcount must be decreased by 1 (refcount—). When the refcount is zero, i.e. the associated stored page is not currently being accessed, the page is unpinned; otherwise it is pinned.
usage_count holds the number of times the associated stored page has been accessed since it was loaded into the corresponding buffer pool slot. Note that usage_count is used in the page replacement algorithm (Section 8.4.4).
context_lock and io_in_progress_lock are light-weight locks that are used to control access to the associated stored page. These fields are described in Section 8.3.2.
flags can hold several states of the associated stored page. The main states are as follows:
dirty bit indicates whether the stored page is dirty.
valid bit indicates whether the stored page can be read or written (valid). For example, if this bit is valid, then the corresponding buffer pool slot stores a page and this descriptor (valid bit) holds the page metadata; thus, the stored page can be read or written. If this bit is invalid, then this descriptor does not hold any metadata; this means that the stored page cannot be read or written or the buffer manager is replacing the stored page.
io_in_progress bit indicates whether the buffer manager is reading/writing the associated page from/to storage. In other words, this bit indicates whether a single process holds the io_in_progress_lock of this descriptor.
freeNext is a pointer to the next descriptor to generate a freelist, which is described in the next subsection.

Buffer ID 描述符的几种状态
Empty: When the corresponding buffer pool slot does not store a page (i.e. refcount and usage_count are 0), the state of this descriptor is empty.
Pinned: When the corresponding buffer pool slot stores a page and any PostgreSQL processes are accessing the page (i.e. refcount and usage_count are greater than or equal to 1), the state of this buffer descriptor is pinned.
Unpinned: When the corresponding buffer pool slot stores a page but no PostgreSQL processes are accessing the page (i.e. usage_count is greater than or equal to 1, but refcount is 0), the state of this buffer descriptor is unpinned.

buffer pool：
存储数据文件的内容。

4、数据库buffer与OS cache

PostgreSQL是一个对操作系统依赖很高的数据库，可以很多人都看到PG中建议shared_buffer设置为内存的1/4，但不太清楚为什么，下图是PG与OS内存交互的过程：

在PG中，shared_buffer中数据都是通过buffer io先进入OS buffer中，然后再与磁盘进行交互。这也是我们常说的double buffer，关于这个话题我打算后面再单独说明，这篇文章里就暂时先不讨论了。

正因为如此，我们一般不建议将PG的shared_buffer设置的很大。当然也不要相信只能使用1/4的物理内存作为Shared Buffers这种固化的经验，对于double buffer这个问题我们后面再详细讲解。

参考链接：
https://www.interdb.jp/pg/pgsql08.html
https://github.com/digoal/blog/blob/master/202104/20210421_01.md

以上是关于简单聊聊PostgreSQL buffer与OS cache的主要内容，如果未能解决你的问题，请参考以下文章