深度学习编译器CINN:框架概览和编译安装
Posted 沉迷单车的追风少年
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了深度学习编译器CINN:框架概览和编译安装相关的知识,希望对你有一定的参考价值。
目录
框架概览
CINN是一种在不改变模型代码的条件下加速飞桨模型运行速度的深度学习编译器。CINN致力于创造训推一体自动调优、分布式编译加速等特色功能,对深度学习模型提供全自动、极致的性能优化,并在科研界和工业界建立影响力。
不同于深度学习框架算子,深度学习编译器算子的粒度更细,算子数目也更少,因此在算子融合和自动调优方面具有更大的优势。在对接上层框架时,编译器会将上层的框架算子进一步拆分为若干基础算子,这样做的目的一方面是为了减少算子开发的工作量,仅实现有限的基础算子便可以组合出大量的上层框架算子;另一方面便于算子融合技术在编译器中可以实现跨算子自动融合,减少最终执行时的kernel数目和访存开销,达到更好的性能;此外,结合自动调优技术使得编译器可以自动优化融合后的kernel,提升kernel性能。
以batch_norm + elementwise_add
算子为例,首先,batch_norm
算子可以被拆分为32个基础算子,而后这些基础算子又可与elementwise_add
相融合,最终可融合为两个kernel,测试结果表明,融合后的kernel在大部分配置下性能均能优于Paddle原生。
来源:community/CINN_base_operator.md at master · PaddlePaddle/community · GitHub
编译安装
编译的过程中坑还是挺多的:
首先是官方的安装教程:CINN/install.md at develop · PaddlePaddle/CINN · GitHub
按照这个安装教程安装docker镜像:
这个脚本改成下面的:
$ docker run --gpus=all -it -v $PWD/CINN:/CINN registry.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.2-cudnn8-gcc82 /bin/bash
然后不熟悉docker的朋友们又要踩坑了,这里的命令是新建了一个容器,我们下次进入的时候应该这么操作:
sudo docker start great_elgamal
sudo docker attach great_elgamal
cd /CINN
把 great_elgamal 换成你自己的容器名就行,如果不知道怎么看容器名,用这个命令:
docker ps -a
然后编译按照这个教程:Install CINN using docker — cinn release/v0.1-rc documentation
直接进行第三步:
这时候只需要泡一杯茶,顺便摸摸鱼等待编译就行了:
======================================================================
FAIL: test_check_results (__main__.TestReciprocalCase1)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/CINN/CINN/python/tests/ops/test_reciprocal_op.py", line 48, in test_check_results
self.check_outputs_and_grads()
File "/CINN/CINN/python/tests/ops/op_test.py", line 104, in check_outputs_and_grads
max_relative_error, all_equal, equal_nan, "Outputs")
File "/CINN/CINN/python/tests/ops/op_test.py", line 211, in check_results
self.assertTrue(is_allclose, msg=error_message)
AssertionError: False is not true : [Check Outputs] [CPU] The 0-th output: total 32 different results, offset=0, shape=(32,), 0.000000e+00 vs 7.132941e+01, maximum_relative_diff=inf (absolute_diff=7.132941e+01).
======================================================================
FAIL: test_check_results (__main__.TestReciprocalCase2)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/CINN/CINN/python/tests/ops/test_reciprocal_op.py", line 48, in test_check_results
self.check_outputs_and_grads()
File "/CINN/CINN/python/tests/ops/op_test.py", line 104, in check_outputs_and_grads
max_relative_error, all_equal, equal_nan, "Outputs")
File "/CINN/CINN/python/tests/ops/op_test.py", line 211, in check_results
self.assertTrue(is_allclose, msg=error_message)
AssertionError: False is not true : [Check Outputs] [CPU] The 0-th output: total 10 different results, offset=0, shape=(10,), 0.000000e+00 vs 1.075237e+00, maximum_relative_diff=inf (absolute_diff=1.075237e+00).
======================================================================
FAIL: test_check_results (__main__.TestReciprocalCase3)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/CINN/CINN/python/tests/ops/test_reciprocal_op.py", line 48, in test_check_results
self.check_outputs_and_grads()
File "/CINN/CINN/python/tests/ops/op_test.py", line 104, in check_outputs_and_grads
max_relative_error, all_equal, equal_nan, "Outputs")
File "/CINN/CINN/python/tests/ops/op_test.py", line 211, in check_results
self.assertTrue(is_allclose, msg=error_message)
AssertionError: False is not true : [Check Outputs] [CPU] The 0-th output: total 10 different results, offset=0, shape=(1, 10), 0.000000e+00 vs 1.194706e+00, maximum_relative_diff=inf (absolute_diff=1.194706e+00).
======================================================================
FAIL: test_check_results (__main__.TestReciprocalOp)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/CINN/CINN/python/tests/ops/test_reciprocal_op.py", line 48, in test_check_results
self.check_outputs_and_grads()
File "/CINN/CINN/python/tests/ops/op_test.py", line 104, in check_outputs_and_grads
max_relative_error, all_equal, equal_nan, "Outputs")
File "/CINN/CINN/python/tests/ops/op_test.py", line 211, in check_results
self.assertTrue(is_allclose, msg=error_message)
AssertionError: False is not true : [Check Outputs] [CPU] The 0-th output: total 32 different results, offset=0, shape=(32,), 0.000000e+00 vs 1.340948e+00, maximum_relative_diff=inf (absolute_diff=1.340948e+00).
----------------------------------------------------------------------
Ran 4 tests in 4.816s
FAILED (failures=4)
174/181 Test #177: tests/ops/test_subtract_op .............. Passed 2.95 sec
175/181 Test #178: tests/ops/test_sum_op ................... Passed 2.67 sec
176/181 Test #179: tests/ops/test_transpose_op ............. Passed 2.52 sec
177/181 Test #180: tests/ops/test_unary_elementwise_op ..... Passed 2.56 sec
178/181 Test #181: tests/ops/test_uniform_random_op ........ Passed 2.58 sec
179/181 Test #176: tests/ops/test_squeeze_op ............... Passed 7.53 sec
180/181 Test #1: test_auto_tuner ......................... Passed 67.97 sec
181/181 Test #120: test02_matmul_case ...................... Passed 113.88 sec
99% tests passed, 2 tests failed out of 181
Total Test time (real) = 123.51 sec
The following tests FAILED:
134 - test_broadcast_to_op (Failed)
164 - tests/ops/test_reciprocal_op (Failed)
Errors while running CTest
编译的结果是有两个test文件会编译失败,问题不大,我们用给的demo试一试能不能编译成功:
编译是成功了,但是运行报错:
./demo: error while loading shared libraries: libcinnapi.so: cannot open shared object file: No such file or directory
我又尝试在GPU上编译:
bash ./build.sh gpu_on ci
但是还是有很多失败的test啊:
77% tests passed, 57 tests failed out of 251
Total Test time (real) = 249.36 sec
The following tests FAILED:
167 - test_cinn_fake_resnet (Failed)
175 - test_broadcast_to_op (Failed)
177 - tests/ops/test_add_op (Failed)
178 - tests/ops/test_batch_norm_op (Failed)
179 - tests/ops/test_binary_elementwise_op (Failed)
180 - tests/ops/test_cast_op (Failed)
182 - tests/ops/test_ceil_op (Failed)
185 - tests/ops/test_concat_op (Failed)
187 - tests/ops/test_divide_op (Failed)
189 - tests/ops/test_fill_constant_op (Failed)
190 - tests/ops/test_floor_divide_op (Failed)
191 - tests/ops/test_gather_nd_op (Failed)
192 - tests/ops/test_gather_op (Failed)
194 - tests/ops/test_gelu_op (Failed)
195 - tests/ops/test_isclose_op (Failed)
197 - tests/ops/test_lookup_table_op (Failed)
198 - tests/ops/test_matmul_op (Failed)
199 - tests/ops/test_max_op (Failed)
200 - tests/ops/test_mod_op (Failed)
201 - tests/ops/test_multiply_op (Failed)
202 - tests/ops/test_norm_op (Failed)
203 - tests/ops/test_one_hot_op (Failed)
205 - tests/ops/test_pow_op (Failed)
206 - tests/ops/test_reciprocal_op (Failed)
207 - tests/ops/test_reduce_op (Failed)
208 - tests/ops/test_relu_op (Failed)
210 - tests/ops/test_scatter_add (Failed)
212 - tests/ops/test_select_op (Failed)
213 - tests/ops/test_sigmoid_op (Failed)
214 - tests/ops/test_sign_op (Failed)
216 - tests/ops/test_slice_op (Failed)
217 - tests/ops/test_split_op (Failed)
219 - tests/ops/test_subtract_op (Failed)
220 - tests/ops/test_sum_op (Failed)
221 - tests/ops/test_transpose_op (Failed)
222 - tests/ops/test_unary_elementwise_op (Failed)
225 - tests/op_mappers/test_atan2_op (Failed)
226 - tests/op_mappers/test_bitwise_op (Failed)
227 - tests/op_mappers/test_compare_op (Failed)
228 - tests/op_mappers/test_cumsum_op (Failed)
229 - tests/op_mappers/test_elementwise_op (Failed)
230 - tests/op_mappers/test_expand_op (Failed)
231 - tests/op_mappers/test_expand_v2_op (Failed)
232 - tests/op_mappers/test_fill_constant_op (Failed)
233 - tests/op_mappers/test_gather_nd_op (Failed)
234 - tests/op_mappers/test_gather_op (Failed)
237 - tests/op_mappers/test_log1p_op (Failed)
238 - tests/op_mappers/test_logical_op (Failed)
240 - tests/op_mappers/test_pow_op (Failed)
241 - tests/op_mappers/test_reduce_op (Failed)
242 - tests/op_mappers/test_scale_op (Failed)
243 - tests/op_mappers/test_sign_op (Failed)
244 - tests/op_mappers/test_split_op (Failed)
246 - tests/op_mappers/test_stack_op (Failed)
247 - tests/op_mappers/test_transpose2_op (Failed)
248 - tests/op_mappers/test_unary_op (Failed)
250 - tests/op_mappers/test_where_op (Failed)
Errors while running CTest
大致浏览一下这些fatal都是什么报错:
======================================================================
FAIL: test_check_results (__main__.TestSqrtOp)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/CINN/CINN/python/tests/op_mappers/test_unary_op.py", line 55, in test_check_results
self.check_outputs_and_grads()
File "/CINN/CINN/build/python/tests/ops/op_test.py", line 104, in check_outputs_and_grads
max_relative_error, all_equal, equal_nan, "Outputs")
File "/CINN/CINN/build/python/tests/ops/op_test.py", line 211, in check_results
self.assertTrue(is_allclose, msg=error_message)
AssertionError: False is not true : [Check Outputs] [NVGPU] The 0-th output: total 2048 different results, offset=641, shape=(32, 64), 5.755169e-04 vs 4.795379e-01, maximum_relative_diff=8.322300e+02 (absolute_diff=4.789624e-01).
======================================================================
FAIL: test_check_results (__main__.TestTanOp)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/CINN/CINN/python/tests/op_mappers/test_unary_op.py", line 55, in test_check_results
self.check_outputs_and_grads()
File "/CINN/CINN/build/python/tests/ops/op_test.py", line 104, in check_outputs_and_grads
max_relative_error, all_equal, equal_nan, "Outputs")
File "/CINN/CINN/build/python/tests/ops/op_test.py", line 211, in check_results
self.assertTrue(is_allclose, msg=error_message)
AssertionError: False is not true : [Check Outputs] [NVGPU] The 0-th output: total 2048 different results, offset=315, shape=(32, 64), 1.842093e-04 vs 4.241378e-01, maximum_relative_diff=2.301478e+03 (absolute_diff=4.239536e-01).
提了个issue,继续学习去了……
跟着教程编译报错 libcinnapi.so · Issue #1214 · PaddlePaddle/CINN · GitHub
有可能是显卡版本的问题,我在3090的机器上试就可以成功,P4卡上编译就有各种奇奇怪怪的bug,很迷。
参考
以上是关于深度学习编译器CINN:框架概览和编译安装的主要内容,如果未能解决你的问题,请参考以下文章
清华大学发布基于元算子和动态编译的深度学习框架-Jittor
清华大学发布基于元算子和动态编译的深度学习框架-计图(Jittor)
业内热点清华大学发布基于元算子和动态编译的深度学习框架- Jittor