17. ClustrixDB 日志管理

Posted 2020-11-23 yuxiaohao

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了17. ClustrixDB 日志管理相关的知识，希望对你有一定的参考价值。

ClustrixDB记录关于重要和有问题的查询的详细信息。这些日志有助于确定以下事项:

慢速查询
资源争用
SQL错误
读取意外数量行的查询
模式变化
全局变量的修改
集群的改变

默认情况下，查询日志是启用的，日志存储在/data/clustrix/log/中。

每个节点将记录其运行的查询的信息，同时充当全局事务管理器(GTM)。为了评估集群范围内的问题，常常需要合并来自所有节点的日志。使用clx logdump来合并和评估日志。

管理查询日志

查询类型

query.log中的每个条目都被归类为这些类型之一。每个查询类型的特定日志记录由指定的全局变量或会话变量控制。

Query Type	Description
SQLERR	这些数据库错误包括语法错误、超时通知和权限问题。默认情况下，所有的SQLERR查询都将被记录下来(session_log_error_queries)。
SLOW	查询执行时间超过了session_log_slow_threshold_ms指定的阈值。（慢SQL）
DDL	记录 CREATE, DROP, ALTER),SET GLOBAL or SESSION command. 默认所有DDL都被记录(session_log_ddl).
BAD	查询读取的行数超过返回预期结果所需的行数。这可能表示计划不好或缺少索引，默认关闭(session_log_bad_queries).
ALTER CLUSTER	通过ALTER cluster命令对集群所做的更改总是自动记录到query.log中。此日志记录不受全局变量控制。

全局变量和会话变量

用于控制查询日志记录的一些变量可能由会话指定，而一些变量仅在系统范围内可用。要设置与日志记录相关的任何变量的值，请使用以下语法:

SET [GLOBAL | SESSION] variable_name = desired_value;

这些变量控制查询和用户日志记录。这里显示的默认设置对于大多数安装来说都是可以接受的。

Name	Description	Default Value	Session Variable
session_log_bad_queries	Log BAD queries to the query.log	false	‌
session_log_ddl	Log DDL statements to query.log	true
session_log_error_queries	Log ERROR statements to query.log	true
session_log_slow_queries	Log SLOW statements to query.log	true
session_log_slow_threshold_ms	Query duration threshold in milliseconds before logging this query	10000	‌
session_log_users	Log LOGIN/LOGOUT to user.log	false

阅读query.log

组件

每个日志条目都以标识数据开始，并包含重要信息，以帮助解决集群中的问题。下面是日志条目的布局。

[timestamp] [hostname] clxnode INSTR [query type] [sid] [db] [user] [ac] [xid] [sql] [status] [time and breakdowns] [internal counters]

Label	Description
timestamp	日期和时间，包括时区。时间同步在所有节点上是非常重要的。
hostname	节点ID和记录条目的主机的名称。此节点充当此事务的GTM。
process name	ClustrixDB进程名(clxnode).
INSTR	这个固定的冗余出现在每行的查询类型之前。
query type	定义日志的类型： SLOW, DDL, BAD, SQLERR, ALTER CLUSTER.
SID	会话ID:用于对给定会话的活动进行分组。
db	DB name
user	执行查询的用户。如果使用基于语句的复制，则在排除来自主服务器的语句故障时搜索复制帐户。
ac	自动提交(Y / N)指标。这对于确定查询是否在用户定义的显式事务中使用非常有用。DDL使用内部生成的显式事务，并且始终为N。
xid	事务ID。在排除锁定问题时，将会话链接到XID非常有用。
sql	这是完整查询的文本。省略号表示文本被截断以适应4KB的限制。
status	方括号中包含的查询结果。例如，这可能是受影响的行或错误消息。
time	从接收、编译和处理查询到返回输出或发生错误所花费的总时间。这在分析慢速查询时特别有用。
对于执行时间超过一个ms的任何查询，运行时间将进一步细分。
translate	Time spent in translate_dml().
prefetch	Time spent building the Sierra stub.
plan	Time spent to plan and normalize the query.
compile	Time spent in compiling Sierra.
execute	Time spent in invocation.

内部计数器`


Label	Description
reads	The number of times the database reads from a container. This may differ from the number of rows_read.
inserts	The number of times the database inserts into a container. This includes both the number of calls and the number of rows written.
deletes	The number of times the database deletes from a container. This includes both the number of calls and the number of rows deleted.
updates	The number of times the database updates a container. This includes both the number of calls and the number of rows updated.
counts	Number of calls by the query execution engine to operators BARRIER_ADD and BARRIER_FETCHADD.
rows_read	Total number of rows read to get all needed data for the query, including reads from indices. Essentially, the total number of rows processed by the last query. This may differ from from the number of rows_output by the query.
forwards	Number of rows forwarded to specific nodes.
broadcasts	Number of rows that were broadcast to all nodes.
rows_output	Total number of rows returned or output by the last query. This is usually the same as the number of rows returned from a query but may occasionally contain counts from internal processes.
semaphore_matches	Number of calls by the query execution engine to operator SEM_ACQUIRE.
fragment_executions	Number of query fragments executed for the query.
cpu_runtime_ns	This represents the aggregate total CPU time spent by all nodes to run the query.
cpu_waits	The number of times the query waited for another query to finish due to the Fair Scheduler.
cpu_waittime_ns	The amount of time spent waiting for CPU due to the Fair Scheduler.
barriers	Number of barriers created for the query. This is used to synchronize message communication between nodes.
barrier_forwards	Number of barriers created to synchronize messaging for forwarded rows.
barrier_flushes	Number of flush operations performed on barriers.
bm_fixes	Number of attempted page fixes by the Buffer Manager.
bm_loads	Number of pages loaded from disk by the Buffer Manager.
bm_waittime_ns	Nanoseconds spent blocked on Buffer Manager page fixes.
lockman_waits	Count of the number of times that the query had to wait for a lock to be released by another query.
lockman_waittime_ms	The total time spent waiting for other queries to release locks on needed rows.
trxstate_waits	Number of calls to trxstate_check that had to block.
trxstate_waittime_ms	Milliseconds spent blocked in trxstate_check.
wal_perm_waittime_ms	Milliseconds spent waiting because the WAL is more than 75% full.
bm_perm_waittime_ms	Milliseconds spent waiting for the Buffer Manager to grant write permission for pages.
sigmas	The number of sigma containers used by the query.
sigma_fallbacks	The number of sigma containers that ran out of memory and had to fall back to disk.
row_count	The total number of rows updated, inserted or deleted by the last query.
found_rows	The number of rows affected by the last statement, but not necessarily output by that statement . A value of 0 or -1 means no rows were found.
insert_id	Not currently being used, always displayed as 0.
fanout	Y/N indicator that tells if fanout was used for this query.
attempts	Number of attempts to automatically retry the query execution after it failed.

以上是关于17. ClustrixDB 日志管理的主要内容，如果未能解决你的问题，请参考以下文章