霓虹灯代码没有优化
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了霓虹灯代码没有优化相关的知识,希望对你有一定的参考价值。
我为android NDK编写了一些简单的Neon内在函数。 这是代码:
float32x4_t vec1;
float32x4_t vec2;
float32x4_t mulneon;
vec1 = vld1q_f32(&a1[0]);
vec2 = vld1q_f32(&a2[0]);
mulneon = vmulq_f32(vec1, vec2);
我期待看到一些像
vld1.32 {v0} ...
vld1.32 {v1} ...
vmul.f32 v0, v1, v0
但我看到的是很多ldr和str指令后跟vmul。见下文。我的问题是android版本不支持vld1吗?或者我是否需要启用其他一些优化
0x7f6ae33a20 <+792>: ldr x8, [sp, #0x198]
0x7f6ae33a24 <+796>: ldr q0, [x8]
0x7f6ae33a28 <+800>: str q0, [sp, #0x120]
0x7f6ae33a2c <+804>: ldr q0, [sp, #0x120]
0x7f6ae33a30 <+808>: str q0, [sp, #0x110]
0x7f6ae33a34 <+812>: ldr q0, [sp, #0x110]
0x7f6ae33a38 <+816>: str q0, [sp, #0x180]
0x7f6ae33a3c <+820>: ldr x8, [sp, #0x1a0]
0x7f6ae33a40 <+824>: ldr q0, [x8]
0x7f6ae33a44 <+828>: str q0, [sp, #0x100]
0x7f6ae33a48 <+832>: ldr q0, [sp, #0x100]
0x7f6ae33a4c <+836>: str q0, [sp, #0xf0]
0x7f6ae33a50 <+840>: ldr q0, [sp, #0xf0]
0x7f6ae33a54 <+844>: str q0, [sp, #0x170]
0x7f6ae33a58 <+848>: ldr x8, [sp, #0x228]
0x7f6ae33a5c <+852>: ldr x10, [sp, #0x198]
0x7f6ae33a60 <+856>: add x8, x10, x8, lsl #2
0x7f6ae33a64 <+860>: str x8, [sp, #0x198]
0x7f6ae33a68 <+864>: ldr x8, [sp, #0x250]
0x7f6ae33a6c <+868>: ldr x10, [sp, #0x1a0]
0x7f6ae33a70 <+872>: add x8, x10, x8, lsl #2
0x7f6ae33a74 <+876>: str x8, [sp, #0x1a0]
0x7f6ae33a78 <+880>: ldr q0, [sp, #0x170]
0x7f6ae33a7c <+884>: str q0, [sp, #0xe0]
0x7f6ae33a80 <+888>: ldr x8, [sp, #0x1a0]
0x7f6ae33a84 <+892>: ldr q0, [sp, #0xe0]
0x7f6ae33a88 <+896>: ldr s1, [x8]
0x7f6ae33a8c <+900>: mov v2.16b, v1.16b
0x7f6ae33a90 <+904>: ins v0.s[3], v2.s[0]
0x7f6ae33a94 <+908>: str q0, [sp, #0xd0]
0x7f6ae33a98 <+912>: ldr q0, [sp, #0xd0]
0x7f6ae33a9c <+916>: str q0, [sp, #0xc0]
0x7f6ae33aa0 <+920>: ldr q0, [sp, #0xc0]
0x7f6ae33aa4 <+924>: str q0, [sp, #0x170]
0x7f6ae33aa8 <+928>: ldr q0, [sp, #0x180]
0x7f6ae33aac <+932>: ldr q2, [sp, #0x170]
0x7f6ae33ab0 <+936>: stur q0, [x29, #-0xa0]
0x7f6ae33ab4 <+940>: stur q2, [x29, #-0xb0]
0x7f6ae33ab8 <+944>: ldur q0, [x29, #-0xa0]
0x7f6ae33abc <+948>: ldur q2, [x29, #-0xb0]
0x7f6ae33ac0 <+952>: fmul v0.4s, v0.4s, v2.4s
答案
问题:
- 看来你是在调试模式下编译的。
- 似乎数组是全局变量或非静态局部常量。
- Android Studio内置的Clang(v4.9)首先从内在函数生成高效的机器代码非常糟糕。
解:
- 将构建类型更改为
Release
- 仅使用局部变量,尤其是内部循环,如果常量数组是本地变量,则将它们声明为静态变量。
- 不要将Clang用于内在函数,或者更好,不要使用内在函数。
以上是关于霓虹灯代码没有优化的主要内容,如果未能解决你的问题,请参考以下文章