Linux 实时性能测试工具——Cyclictest 的使用与分析

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Linux 实时性能测试工具——Cyclictest 的使用与分析相关的知识,希望对你有一定的参考价值。

关于Cyclictest工具,在Wiki上有说明:https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest


  Cyclictest is a high resolution test program, written by User:Tglx, maintained by Clark Williams and John Kacur

Documentation Installation
  Get the latest sources from the git repository, do a git clone git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git or fetch a released tarball from the archive, untar into a directory of your choice and run make in the source directory. If you want to cross compile, just run make CROSS_COMPILE= (for example make CROSS_COMPILE=arm-v4t-linux-gnueabi-).
  You can run the resulting binary from there or install it.

#需要安装libnuma-devel包后make编译
yum install numactl-devel
git clone git://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
cd rt-tests
git checkout stable/v1.0
make all
make install
make cyclictest 

Run it

Make sure to be root or use sudo to run cyclictest.
Without parameters cyclictest creates one thread with a 1ms interval timer.
cyclictest -h provides help text for the various options

[[email protected] rt-tests]# ./cyclictest  --help
cyclictest V 1.00
Usage:
cyclictest <options>

-a [CPUSET] --affinity     Run thread #N on processor #N, if possible, or if CPUSET
                           given, pin threads to that set of processors in round-
                           robin order.  E.g. -a 2 pins all threads to CPU 2,
                           but -a 3-5,0 -t 5 will run the first and fifth
                           threads on CPU (0),thread #2 on CPU 3, thread #3
                           on CPU 4, and thread #5 on CPU 5.
-A USEC  --aligned=USEC    align thread wakeups to a specific offset
-b USEC  --breaktrace=USEC 当延时大于USEC指定的值时,发送停止跟踪。USEC,单位为谬秒(us)。
-B       --preemptirqs     both preempt and irqsoff tracing (used with -b)
-c CLOCK --clock=CLOCK     选择时钟  cyclictest -c 1
                           0 = CLOCK_MONOTONIC (默认)
                           1 = CLOCK_REALTIME 
-C       --context         context switch tracing (used with -b)
-d DIST  --distance=DIST   distance of thread intervals in us, default=500
-D       --duration=TIME   specify a length for the test run.
                           Append ‘m‘, ‘h‘, or ‘d‘ to specify minutes, hours or days.
         --latency=PM_QOS  write PM_QOS to /dev/cpu_dma_latency
-E       --event           event tracing (used with -b)
-f       --ftrace           ftrace函数跟踪(通常与-b 配套使用,其实通常使用 -b 即可,不使用 -f )
-F       --fifo=<path>     create a named pipe at path and write stats to it
-h       --histogram=US    在执行完后在标准输出设备上画出延迟的直方图(很多线程有相同的权限)US为最大的跟踪时间限制,这个在下面介绍实例时可以用到,结合gnuplot 可以画出我们测试的结果图。
-H       --histofall=US    same as -h except with an additional summary column
         --histfile=<path> dump the latency histogram to <path> instead of stdout
-i INTV  --interval=INTV    基本线程间隔,默认为1000(单位为us)
-I       --irqsoff         Irqsoff tracing (used with -b)
-l LOOPS --loops=LOOPS     循环的个数,默认为0(无穷个),与 -i 间隔数结合可大致算出整个测试的时间,比如 -i 1000  -l 1000000 ,总的循环时间为1000*1000000=1000000000 us =1000s ,所以大致为16分钟多。
         --laptop          Save battery when running cyclictest
                           This will give you poorer realtime results
                           but will not drain your battery so quickly
-m       --mlockall       锁定当前和将来的内存分配
-M       --refresh_on_max  delay updating the screen until a new max
                           latency is hit. Userful for low bandwidth.
-n       --nanosleep       使用 clock_nanosleep
         --notrace         suppress tracing
-N       --nsecs           print results in ns instead of us (default us)
-o RED   --oscope=RED      oscilloscope mode, reduce verbose output by RED
-O TOPT  --traceopt=TOPT   trace option
-p PRIO  --priority=PRIO   最高优先级线程的优先级  使用方法: -p 90 /  --prio=90
-P       --preemptoff      Preempt off tracing (used with -b)
         --policy=NAME     policy of measurement thread, where NAME may be one
                           of: other, normal, batch, idle, fifo or rr.
         --priospread      spread priority levels starting at specified value
-q       --quiet           使用-q 参数运行时不打印信息,只在退出时打印概要内容,结合-h HISTNUM参数会在退出时打印HISTNUM 行统计信息以及一个总的概要信息。
-r       --relative        use relative timer instead of absolute
-R       --resolution      check clock resolution, calling clock_gettime() many
                           times.  List of clock_gettime() values will be
                           reported with -X
         --secaligned [USEC] align thread wakeups to the next full second
                           and apply the optional offset
-s       --system          use sys_nanosleep and sys_setitimer
-S       --smp             Standard SMP testing: options -a -t -n and
                           same priority of all threads
        --spike=<trigger>  record all spikes > trigger
        --spike-nodes=[num of nodes]
                           These are the maximum number of spikes we can record.
                           The default is 1024 if not specified
         --smi             Enable SMI counting
-t       --threads         one thread per available processor
-t [NUM] --threads=NUM     number of threads:
                           without NUM, threads = max_cpus
                           without -t default = 1
         --tracemark       write a trace mark when -b latency is exceeded
-T TRACE --tracer=TRACER   set tracing function
    configured tracers: unavailable (debugfs not mounted)
-u       --unbuffered      force unbuffered output for live processing
-U       --numa            Standard NUMA testing (similar to SMP option)
                           thread data structures allocated from local node
-v       --verbose         output values on stdout for statistics
                           format: n:c:v n=tasknum c=count v=value in us
-w       --wakeup          task wakeup tracing (used with -b)
-W       --wakeuprt        rt task wakeup tracing (used with -b)
         --dbg_cyclictest  print info useful for debugging cyclictest

推荐参数以及结果实例

[[email protected] rt-tests]# sudo ./cyclictest -p 90 - m -c 0 -i 200 -n -h 100 -q -l 1000000

我们使用 -p 90给cyclictest 赋优先级90,使用-m参数锁定内存分配,使用  -c 0指定使用默认的MONOTONIC 时钟, -i 200 指定一个循环为200us,结合 -l 1000000为总共1000000个循环,此外-n 为使用nanosleep 而不是简单的sleep,-q为在运行时不打印即时信息,-h 100 为总共统计100个信息在最后的结果中。

-----
#/dev/cpu_dma_latency set to 0us
-------------(下面都是结束测试/终端测试后打印的信息,这就是 -q 的功效!)
#Histogram
000000 000000
000001 000000
000002 000000
000003 000000
000004 000000
000005 000002       -- 延时为5us的在1000000次循环中占2次(下面每行都是这个意思)
000006 000009       
..........此处省略
000099 000005      -- 我们使用 -h 100 ,所以在结果中记录了延时为 0us ~ 99us 的次数
#Total: 000999914
#Min Latencies: 00005       -- 最小延时 5 us
#Avg Latencies: 00012       -- 平均延时 12us
#Max Latencies: 19920      -- 最大延时19920 us,那么我们指定histogram = 100也就是只记录了0us~99us的值而最大延时为19920 也就是说肯定有很多此延时超过99 us,那么记录到哪了?答案是,没有记录具体的超过99us的延时值,只在下面记录了超过99us 的延时次数(记录在Overflows),以及第几次超过了(记录在Thread 0)。
#Histogram Overflows: 00086      -- 超过99 us的次数
#Histogram Overflow at cycle number:
#Thread 0: 65668 162024 164458 166533 171828 174546 179471 182538 188257 198415 202689 209055
211934 224529 227292 239809 267144 311992 312072 335066 341986 353395 355217 355295 355297 385017
411492 417012 443642 453450 453463 453478 453492 453504 453505 453522 453540 482063 482116 482797
483077 486153 515557 517062 517066 522812 538214 560636 574301 574500 598338 602175 610697 620924
678231 692237 692242 692247 713557 779826 797948 851442 860635 860642 860654 860661 861147 875755
880618 883622 884128 884238 885915 887215 887457 896442 925069 928998 942590 947161 947871 955507
955508 982245 982250 992192  //这里记录的是第几次循环的延时超过了99us。
$ sudo cyclictest -t 2  // 使用两个测试线程
policy: other/other: loadavg: 0.00 0.01 0.05 1/346 2595
T: 0 ( 2594) P: 0 I:1000 C:  14090 Min:     32 Act:  200 Avg:  177 Max:    2855 
T: 1 ( 2595) P: 0 I:1500 C:   9397 Min:     23 Act:  202 Avg:  170 Max:    2863

输出结果含义:
T: 0 序号为0的线程
P: 0 线程优先级为0
C: 9397 计数器。线程的时间间隔每达到一次,计数器加1
I: 1000 时间间隔为1000微秒(us)
Min: 最小时延(us)
Act: 最近一次的时延(us)
Avg:平均时延(us)
Max: 最大时延(us)

Expected Results

tglx’s reference machine

  All tests have been run on a Pentium III 400MHz based PC.
  The tables show comparisons of vanilla Linux 2.6.16, Linux-2.6.16-hrt5 and Linux-2.6.16-rt12. The tests for intervals less than the jiffy resolution have not been run on vanilla Linux 2.6.16. The test thread runs in all cases with SCHED_FIFO and priority 80. All numbers are in microseconds.

Test case: clock_nanosleep(TIME_ABSTIME), Interval 10000
microseconds,. 10000 loops, no load.

Commandline: cyclictest -t1 -p 80 -n -i 10000 -l 10000 
Kernel min max avg 
2.6.16 24 4043 1989 
2.6.16-hrt5 12 94 20 
2.6.16-rt12 6 40 10

1Test case: clock_nanosleep(TIME_ABSTIME), Interval 10000 micro
seconds,. 10000 loops, 100% load.

Commandline: cyclictest -t1 -p 80 -n -i 10000 -l 10000 
Kernel min max avg 
2.6.16 55 4280 2198 
2.6.16-hrt5 11 458 55 
2.6.16-rt12 6 67 29

Test case: POSIX interval timer, Interval 10000 micro seconds,. 10000
loops, no load.

Commandline: cyclictest -t1 -p 80 -i 10000 -l 10000 
Kernel min max avg 
2.6.16 21 4073 2098 
2.6.16-hrt5 22 120 35 
2.6.16-rt12 20 60 31

Test case: POSIX interval timer, Interval 10000 micro seconds,. 10000
loops, 100% load.

Commandline: cyclictest -t1 -p 80 -i 10000 -l 10000 
Kernel min max avg 
2.6.16 82 4271 2089 
2.6.16-hrt5 31 458 53 
2.6.16-rt12 21 70 35

Test case: clock_nanosleep(TIME_ABSTIME), Interval 500 micro
seconds,. 100000 loops, no load.

Commandline: cyclictest -t1 -p 80 -i 500 -n -l 100000 
Kernel min max avg 
2.6.16-hrt5 5 108 24 
2.6.16-rt12 5 48 7

Test case: clock_nanosleep(TIME_ABSTIME), Interval 500 micro
seconds,. 100000 loops, 100% load.

Commandline: cyclictest -t1 -p 80 -i 500 -n -l 100000 
Kernel min max avg 
2.6.16-hrt5 9 684 56 
2.6.16-rt12 10 60 22

Test case: POSIX interval timer, Interval 500 micro seconds,. 100000
loops, no load.

Commandline: cyclictest -t1 -p 80 -i 500 -l 100000 
Kernel min max avg 
2.6.16-hrt5 8 119 22 
2.6.16-rt12 12 78 16

Test case: POSIX interval timer, Interval 500 micro seconds,. 100000
loops, 100% load.

Commandline: cyclictest -t1 -p 80 -i 500 -l 100000 
Kernel min max avg 
2.6.16-hrt5 16 489 58 
2.6.16-rt12 12 95 29

FAQ

ps shows the wrong scheduling class SCHED_OTHER

  Each cyclictest-task consist of one or more threads. ps -ce shows only the main-process not the threads of the main-process. ps -eLc | grep cyclic shows the main-process an the containing threads with the correct scheduler class SCHED_FIFO.

#>./cyclictest -t5 -p 80 -n -i 10000

#> ps -cLe | grep cyclic
 4764  4764 TS   19 pts/1    00:00:01 cyclictest
 4764  4765 FF  120 pts/1    00:00:00 cyclictest
 4764  4766 FF  119 pts/1    00:00:00 cyclictest
 4764  4767 FF  118 pts/1    00:00:00 cyclictest
 4764  4768 FF  117 pts/1    00:00:00 cyclictest
 4764  4769 FF  116 pts/1    00:00:00 cyclictest

chrt shows the wrong scheduling class SCHED_OTHER

  Don’t use the PID of the main-process, but the pid of one of the threads from the main-process. The threads are shown with ps -cLe | grep cyclic.

#> chrt -p 4766
pid 4766‘s current scheduling policy: SCHED_FIFO
pid 4766‘s current scheduling priority: 79

以上是关于Linux 实时性能测试工具——Cyclictest 的使用与分析的主要内容,如果未能解决你的问题,请参考以下文章

性能测试 基于Python结合InfluxDB及Grafana图表实时采集Linux多主机性能数据

Linux 实时性能测试工具——Cyclictest 的使用与分析

界面酷炫,功能强大!这款 Linux 性能实时监控工具超好用!老斯机搞它!

Linux 实时性能测试工具——Cyclictest 的使用与分析

MySQL实时性能监控工具doDBA tools

linux命令:top 命令(性能分析工具)