PostgreSQL从库只读的实现原理
Posted PostgreSQLChina
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了PostgreSQL从库只读的实现原理相关的知识,希望对你有一定的参考价值。
作者:杨向博
我们接触到的数据库,基本都提供了只读实例的功能。在业务允许的情况下可以将一些读取数据的请求下发至只读实例,减缓primary的压力,以获得更佳的性能。
物理从库只读已经是一个常识问题,在PostgreSQL中具体是怎么实现的呢,一起来看看。
一、报错分析
从报错入手分析,在只读库里执行写操作,报错如下:
postgres=# select * into tbl_t from tbl_test;
ERROR: cannot execute SELECT INTO in a read-only transaction
postgres=#
报错的函数为:
/*
* PreventCommandIfReadOnly: throw error if XactReadOnly
*
* This is useful partly to ensure consistency of the error message wording;
* some callers have checked XactReadOnly for themselves.
*/
void
PreventCommandIfReadOnly(const char *cmdname)
{
if (XactReadOnly)
ereport(ERROR,
(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
/* translator: %s is name of a SQL command, eg CREATE */
errmsg('cannot execute %s in a read-only transaction',
cmdname)));
}
当XactReadOnly为true时,就抛出报错,XactReadOnly是一个Bool类型的全局变量。那这里大概明白设计思路了,当执行一些操作时调用PreventCommandIfReadOnly函数,如果库是只读状态进行写操作就抛出报错。
报错堆栈:
(gdb) bt
#0 PreventCommandIfReadOnly (cmdname=0xbab6f8 'SELECT INTO') at utility.c:409
#1 0x00000000008c24a5 in standard_ProcessUtility (pstmt=0x1faef00, queryString=0x1fae098 'select * into tbl_t from tbl_test;', context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1fafa18, qc=0x7ffc8ac8a180)
at utility.c:566
#2 0x00000000008c23cc in ProcessUtility (pstmt=0x1faef00, queryString=0x1fae098 'select * into tbl_t from tbl_test;', context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1fafa18, qc=0x7ffc8ac8a180)
at utility.c:524
#3 0x00000000008c14e2 in PortalRunUtility (portal=0x1fffa58, pstmt=0x1faef00, isTopLevel=true, setHoldSnapshot=false, dest=0x1fafa18, qc=0x7ffc8ac8a180) at pquery.c:1157
#4 0x00000000008c16ce in PortalRunMulti (portal=0x1fffa58, isTopLevel=true, setHoldSnapshot=false, dest=0x1fafa18, altdest=0x1fafa18, qc=0x7ffc8ac8a180) at pquery.c:1303
#5 0x00000000008c0c6c in PortalRun (portal=0x1fffa58, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1fafa18, altdest=0x1fafa18, qc=0x7ffc8ac8a180) at pquery.c:779
#6 0x00000000008bb204 in exec_simple_query (query_string=0x1fae098 'select * into tbl_t from tbl_test;') at postgres.c:1239
#7 0x00000000008bf053 in PostgresMain (argc=1, argv=0x1fd7af0, dbname=0x1fd7a28 'postgres', username=0x1fd7a08 'postgres') at postgres.c:4315
#8 0x0000000000820e1f in BackendRun (port=0x1fd13f0) at postmaster.c:4536
#9 0x0000000000820630 in BackendStartup (port=0x1fd13f0) at postmaster.c:4220
#10 0x000000000081cf62 in ServerLoop () at postmaster.c:1739
#11 0x000000000081c843 in PostmasterMain (argc=3, argv=0x1fa9d70) at postmaster.c:1412
#12 0x0000000000733286 in main (argc=3, argv=0x1fa9d70) at main.c:210
(gdb) p XactReadOnly
$1 = true
接下来追踪全局变量XactReadOnly在什么时候被赋值为true。
二、变量追踪
走读代码:postmaster守护进程fork出postgres后,在InitPostgres阶段StartTransaction()函数根据是否RecoveryInProgress来赋值,如果RecoveryInProgress()为true就设置XactReadOnly=true
/*
* Make sure we've reset xact state variables
*
* If recovery is still in progress, mark this transaction as read-only.
* We have lower level defences in XLogInsert and elsewhere to stop us
* from modifying data during recovery, but this gives the normal
* indication to the user that the transaction is read-only.
*/
if (RecoveryInProgress())
{
s->startedInRecovery = true;
XactReadOnly = true;
}
使用gdb attach postmasterPID后,抓到了这个过程
Breakpoint 3, StartTransaction () at xact.c:1895
1895 s = &TopTransactionStateData;
(gdb) bt
#0 StartTransaction () at xact.c:1895
#1 0x0000000000536318 in StartTransactionCommand () at xact.c:2827
#2 0x0000000000a40778 in InitPostgres (in_dbname=0x1fd7a28 'postgres', dboid=0, username=0x1fd7a08 'postgres', useroid=0, out_dbname=0x0, overrid
e_allow_connections=false) at postinit.c:739
#3 0x00000000008beb60 in PostgresMain (argc=1, argv=0x1fd7af0, dbname=0x1fd7a28 'postgres', username=0x1fd7a08 'postgres') at postgres.c:3933
#4 0x0000000000820e1f in BackendRun (port=0x1fd13f0) at postmaster.c:4536
#5 0x0000000000820630 in BackendStartup (port=0x1fd13f0) at postmaster.c:4220
#6 0x000000000081cf62 in ServerLoop () at postmaster.c:1739
#7 0x000000000081c843 in PostmasterMain (argc=3, argv=0x1fa9d70) at postmaster.c:1412
#8 0x0000000000733286 in main (argc=3, argv=0x1fa9d70) at main.c:210
(gdb) p XactReadOnly
$1 = false
(gdb) list
1890 VirtualTransactionId vxid;
1891
1892 /*
1893 * Let's just make sure the state stack is empty
1894 */
1895 s = &TopTransactionStateData;
1896 CurrentTransactionState = s;
1897
1898 Assert(!FullTransactionIdIsValid(XactTopFullTransactionId));
1899
(gdb) n
1896 CurrentTransactionState = s;
(gdb)
1909 s->state = TRANS_START;
(gdb)
1910 s->fullTransactionId = InvalidFullTransactionId; /* until assigned */
(gdb) n
1913 xact_is_sampled = log_xact_sample_rate != 0 &&
(gdb)
1922 s->nestingLevel = 1;
(gdb)
1923 s->gucNestLevel = 1;
(gdb)
1924 s->childXids = NULL;
(gdb)
1925 s->nChildXids = 0;
(gdb)
1926 s->maxChildXids = 0;
(gdb)
1932 GetUserIdAndSecContext(&s->prevUser, &s->prevSecContext);
(gdb)
1945 if (RecoveryInProgress())
(gdb)
1947 s->startedInRecovery = true;
(gdb) p RecoveryInProgress()
$2 = true
(gdb) n
1948 XactReadOnly = true;
(gdb) p XactReadOnly
$3 = true
XactReadOnly赋值是在StartTransaction ()函数中完成的,首次赋值是在InitPostgres环节,除此之外,后续每次启动事务后会重新赋值。
XactReadOnly是否为ture取决于RecoveryInProgress()的返回值是否为true。对于Primary实例startup进程不是常驻内存的,因此只有在启动过程中以及crash后进行recovery,完成后startup进程退出;对于standby实例来说startup进程是常驻内存的,一直通过redo来回放primary发送的日志,因此RecoveryInProgress()一直为ture。
RecoveryInProgress()函数逻辑很简单,返回值取决于共享内存xlogctl->SharedRecoveryState的值,只有在StartupXlog函数退出时会修改为RECOVERY_STATE_DONE;因此在standby实例中RecoveryInProgress()函数一直返回true。
/*
* Is the system still in recovery
*
* Unlike testing InRecovery, this works in any process that's connected to
* shared memory.
*
* As a side-effect, we initialize the local TimeLineID and RedoRecPtr
* variables the first time we see that recovery is finished.
*/
bool
RecoveryInProgress(void)
{
/*
* We check shared state each time only until we leave recovery mode. We
* can't re-enter recovery, so there's no need to keep checking after the
* shared variable has once been seen false.
*/
if (!LocalRecoveryInProgress)
return false;
else
{
/*
* use volatile pointer to make sure we make a fresh read of the
* shared variable.
*/
volatile XLogCtlData *xlogctl = XLogCtl;
LocalRecoveryInProgress = (xlogctl->SharedRecoveryState != RECOVERY_STATE_DONE);
/*
* Initialize TimeLineID and RedoRecPtr when we discover that recovery
* is finished. InitPostgres() relies upon this behaviour to ensure
* that InitXLOGAccess() is called at backend startup. (If you change
* this, see also LocalSetXLogInsertAllowed.)
*/
if (!LocalRecoveryInProgress)
{
/*
* If we just exited recovery, make sure we read TimeLineID and
* RedoRecPtr after SharedRecoveryState (for machines with weak
* memory ordering).
*/
pg_memory_barrier();
InitXLOGAccess();
}
/*
* Note: We don't need a memory barrier when we're still in recovery.
* We might exit recovery immediately after return, so the caller
* can't rely on 'true' meaning that we're still in recovery anyway.
*/
return LocalRecoveryInProgress;
}
}
三、小结
通过前边的分析,将设定只读,以及只读处理的过程通过框图来表示。
- 左边灰色框表示只读的设定,在fork出postgres之后,根据是否RecoveryInProgress来设定是否只读,实例为standby角色则设置为只读;
- 右边框图表示只读处理的过程,执行器进行判断,如果实例为standby角色,并且执行写操作,则调用PreventCommandIfReadOnly函数抛出报错终止写操作。
整体来看pg通过一个bool全局变量XactReadOnly,设定相关的逻辑,来操作实例是否只读,以及只读处理。
四、延伸
1、XactReadOnly变量对应guc参数transaction_read_only,可以使用show命令查看;但是这个参数是GUC_NO_RESET_ALL类,也就是说不能通过alter system方式修改;
2、如何设置primary实例为只读状态?
Alter system set default_transaction_read_only to on;
select pg_reload_conf();
即可
{
{'default_transaction_read_only', PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop('Sets the default read-only status of new transactions.'),
NULL
},
&DefaultXactReadOnly,
false,
NULL, NULL, NULL
},
{
{'transaction_read_only', PGC_USERSET, CLIENT_CONN_STATEMENT,
gettext_noop('Sets the current transaction's read-only status.'),
NULL,
GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
},
&XactReadOnly,
false,
check_transaction_read_only, NULL, NULL
},
以上是关于PostgreSQL从库只读的实现原理的主要内容,如果未能解决你的问题,请参考以下文章