PostgreSql 因错误而崩溃:“服务器进程(PID XXXX)被异常 0xC0000142 终止”
Posted
技术标签:
【中文标题】PostgreSql 因错误而崩溃:“服务器进程(PID XXXX)被异常 0xC0000142 终止”【英文标题】:PostgreSql crashed with error: 'server process (PID XXXX) was terminated by exception 0xC0000142' 【发布时间】:2015-07-31 00:49:02 【问题描述】:我有 Postgresql 9.2 在 4G 内存,Atom N2800 CPU Windows POS READY 嵌入式系统(如 XP)机器上运行,基本上它在生产环境中运行良好多年,但崩溃(服务停止) 在最近的性能(非压力)测试中经常出现。
我认为测试不会造成太大压力,启用log_min_duration_statement = 0
,下面列出了测试完成的简化总体统计数据:
说 20 分钟是一个度量单位,所以在一个单位期间:
UPDATE 5000 次,每次查询包含 20KB 大小的数据(包含 Text
字段)。
SELECT 35000 次,每次查询返回 20KB 大小的数据(以获取 Text
字段)。
在崩溃之前日志没有发现任何异常并留下这个:
2015-07-29 16:41:53.500 SGT,,,5512,,55b87f74.1588,2,,2015-07-29 15:23:32 SGT,,0,LOG,00000,"server process (PID 4416) was terminated by exception 0xC0000142",,"See C include file ""ntstatus.h"" for a description of the hexadecimal value.",,,,,,,""
2015-07-29 16:41:53.500 SGT,,,5512,,55b87f74.1588,3,,2015-07-29 15:23:32 SGT,,0,LOG,00000,"terminating any other active server processes",,,,,,,,,""
2015-07-29 16:41:53.500 SGT,"eps","transactiondatabase",6960,"127.0.0.1:9162",55b891cf.1b30,9,"idle",2015-07-29 16:41:51 SGT,146/0,0,WARNING,57P00,"terminating connection because of crash of another server process","The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.","In a moment you should be able to reconnect to the database and repeat your command.",,,,,,,""
2015-07-29 16:41:53.515 SGT,"eps","transactiondatabase",5828,"127.0.0.1:9150",55b891c2.16c4,155,"idle",2015-07-29 16:41:38 SGT,145/0,0,WARNING,57P00,"terminating connection because of crash of another server process","The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.","In a moment you should be able to reconnect to the database and repeat your command.",,,,,,,""
2015-07-29 16:41:53.515 SGT,"eps","transactiondatabase",6448,"127.0.0.1:9148",55b891c2.1930,5,"idle",2015-07-29 16:41:38 SGT,93/0,0,WARNING,57P00,"terminating connection because of crash of another server process","The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.","In a moment you should be able to reconnect to the database and repeat your command.",,,,,,,""
....
....
2015-07-29 16:41:54.500 SGT,,,8004,,55b87f76.1f44,2,,2015-07-29 15:23:34 SGT,1/0,0,WARNING,57P00,"terminating connection because of crash of another server process","The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.","In a moment you should be able to reconnect to the database and repeat your command.",,,,,,,""
2015-07-29 16:41:54.515 SGT,,,5512,,55b87f74.1588,4,,2015-07-29 15:23:32 SGT,,0,LOG,00000,"all server processes terminated; reinitializing",,,,,,,,,""
2015-07-29 16:42:04.515 SGT,,,5512,,55b87f74.1588,5,,2015-07-29 15:23:32 SGT,,0,FATAL,XX000,"pre-existing shared memory block is still in use",,"Check if there are any old server processes still running, and terminate them.",,,,,,,""
2015-07-29 16:51:02.078 SGT,,,5828,,55b893f6.16c4,1,,2015-07-29 16:51:02 SGT,,0,LOG,00000,"database system was interrupted; last known up at 2015-07-29 16:40:36 SGT",,,,,,,,,""
2015-07-29 16:51:02.093 SGT,,,5828,,55b893f6.16c4,2,,2015-07-29 16:51:02 SGT,,0,LOG,00000,"database system was not properly shut down; automatic recovery in progress",,,,,,,,,""
2015-07-29 16:51:02.109 SGT,,,5828,,55b893f6.16c4,3,,2015-07-29 16:51:02 SGT,,0,LOG,00000,"redo starts at 0/12C79578",,,,,,,,,""
2015-07-29 16:51:02.421 SGT,,,5828,,55b893f6.16c4,4,,2015-07-29 16:51:02 SGT,,0,LOG,00000,"unexpected pageaddr 0/1046A000 in log file 0, segment 19, offset 4628480",,,,,,,,,""
2015-07-29 16:51:02.421 SGT,,,5828,,55b893f6.16c4,5,,2015-07-29 16:51:02 SGT,,0,LOG,00000,"redo done at 0/13469FC8",,,,,,,,,""
我可以指出的一点是shared_buffers
的数据库配置,现在设置为256MB
,它只是无缘无故地存在,是否有助于增加这个值?
其他主要设置:max_connections=200
、temp_buffers = 16MB
、work_mem = 8MB
任何人都可以帮助检查崩溃是如何发生的,或者如何最小化范围?
【问题讨论】:
【参考方案1】:MSDN says:
0xC0000142
STATUS_DLL_INIT_FAILED
DLL 初始化失败 动态链接库 %hs 的初始化失败。进程异常终止。
所以这是一个 DLL 加载问题和/或启动新进程的问题。如果我不得不猜测,我会说您可能在 XP Embedded 系统上对打开文件的数量、正在运行的进程的数量等进行了限制。你可能想降低max_connections
。
【讨论】:
哦!我认为错误代码属于 PostgreSql。根据您的建议,我查看了与此异常一起捕获的性能监视器数据,我注意到系统句柄在 3 小时内一直从 3K 上升到 10K,直到服务停止,但是当我查看每个进程的句柄时,它们是一切正常,您有什么建议吗? 使用 Microsoft sysinternals 中的 Process Explorer 和/或 Process Monitor 来获取更详细的信息并尝试识别泄漏句柄的内容。假设不只是你有越来越多的进程,那么即使每个进程持有的句柄数没有变化,总句柄数也达到了极限。 我可以从当前的200
降低max_connections
并进行另一轮测试以查看任何改进。但是,我们已经使用了启用连接池的 postgresql odbc 驱动程序,所以根据我的理解和想象,通过降低max_connections
,我们会看到客户端的连接数据库超时,而目前情况的根本原因是来自客户端的大规模连接尝试,对吗?
@Shawn 从可用信息中很难判断。可能。我不太了解 XP 嵌入式。也可能是某处长时间运行的进程中的句柄泄漏,以上是关于PostgreSql 因错误而崩溃:“服务器进程(PID XXXX)被异常 0xC0000142 终止”的主要内容,如果未能解决你的问题,请参考以下文章