markdown [20190806] Atlas300性能实验
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了markdown [20190806] Atlas300性能实验相关的知识,希望对你有一定的参考价值。
| 模块(单进程) | batch=4耗时 | Batch=8耗时 | Batch=16耗时 |
| -------------------------------------- | --------------------- | ----------- | ------------ |
| DvppJpegDecode | 2.878*4 | 2.873*8 | 2.859 |
| ObjectDetectStage1_v3_Input | 0.028 | 0.036 | 0.036 |
| ObjectDetectStage1_v3_PreProcess | 6.018 | 11.945 | 11.985 |
| ObjectDetectStage1_v3_Predict | 24.847 | 48.648 | 97.002 |
| ObjectDetectStage1_v3_GetLayer | 0.743 | 0.77 | 0.771 |
| ObjectDetectStage1_v3_PostProcess | 0.664*4 | 0.661*8 | 0.661 |
| ObjectDetectStage1_v3_Output | 0.011*4 | 0.011*8 | 0.01 |
| VehicleDetectStage2_Input | 0.028 | 0.034 | 0.034 |
| VehicleDetectStage2_Index | 0.005 | 0.007 | 0.009 |
| VehicleDetectStage2_PreProcess | 0.842 | 1.563 | 2.643 |
| VehicleDetectStage2_Predict | 4.516 | 7.334 | 13.754 |
| VehicleDetectStage2_GetLayer | 0.206 | 0.201 | 0.188 |
| VehicleDetectStage2_PostProcess | 0.172*4 | 0.163*8 | 0.159 |
| VehicleDetectStage2_Output | 0.003*4 | 0.003*8 | 0.021 |
| VehicleDetectStage2_Segment_Index | 0.005 | 0.009 | 0.01 |
| VehicleDetectStage2_Segment_PreProcess | 0.354 | 0.578 | 0.544 |
| VehicleDetectStage2_Segment_Predict | 0.784 | 0.947 | 1.02 |
| VehicleDetectStage2_Segment_Output | 0.002 | 0.002 | 0.002 |
| Vehicle_Input | 0.023 | 0.027 | 0.027 |
| Vehicle_Index | 0.002 | 0.005 | 0.006 |
| Vehicle_PreProcess | 0.759 | 1.526 | 2.741 |
| Vehicle_Predict | 1.555 | 2.247 | 3.327 |
| Vehicle_Classification | 0.011*4 | 0.02*8 | 0.021 |
| Vehicle_ExtractFeature | 0.008*4 | 0.016*8 | 0.018 |
| Vehicle_FeatureCode | 0.487 | 0.996 | 1.084 |
| VehicleDriver_Input | 0.025 | 0.031 | 0.028 |
| VehicleDriver_Index | 0.006 | 0.01 | 0.012 |
| VehicleDriver_PreProcess | 0.289 | 0.354 | 0.323 |
| VehicleDriver_Predict | 4.38 | 6.736 | 11.514 |
| VehicleDriver_ArgMax | 0.001*4 | 0.001*8 | 0.001 |
| VehicleDriver_Output | 0.001*4 | 0.001*8 | 0.001 |
| VehiclePlate_Input | 0.024 | 0.03 | 0.027 |
| VehiclePlateAlign_Index | 0.063 | 0.066 | 0.07 |
| VehiclePlateAlign_PreProcess | 0.644 | 0.646 | 0.934 |
| VehiclePlateAlign_Predict | 1.614 | 2.774 | 6.691 |
| VehiclePlateAlign_Output | 0.001*4 | 0.001*8 | 0.001 |
| VehiclePlate_ProcessPlate | 9.818 | 16.827 | 17.364 |
| VehiclePlate_Index | 5.442 | 10.142 | 10.578 |
| VehiclePlate_CropToMat | 0.186 | 0.152 | 0.154 |
| VehiclePlate_ImageRotate | 1.21 | 1.178 | 1.251 |
| VehiclePlate_PreProcess | 0.05 | 0.098 | 0.163 |
| VehiclePlate_Predict | 1.461 | 2.154 | 3.594 |
| VehiclePlate_Output | 0.094 | 0.172 | 0.279 |
| VehiclePlate_PostProcess | 0.054 | 0.098 | 0.1 |
| VehicleSpecial_Input | 0.019 | 0.019 | 0.018 |
| VehicleSpecial_Index | 0.018 | 0.026 | 0.027 |
| VehicleSpecial_PreProcess | 1.127 | 2.012 | 2.71 |
| VehicleSpecial_Predict | 2.053 | 3.078 | 5.366 |
| VehicleSpecial_GetLayer | 0.022 | 0.02 | 0.019 |
| VehicleSpecial_PostProcess | 0.039*4 | 0.035*8 | 0.035 |
| VehicleSpecial_Output | 0.001*4 | 0.001*8 | 0.001 |
| 其他(框架消耗?) | 111.888-76.774=35.114 | | |
| | | | |
| 像素 | 类型 | NPU | 进程数(线程数) | Batch | ARM使用率(%) | NPU使用率(%) | NPU显存(M) | 算法时间(ms) | 解码时间(ms) | 每天处理图片数量(张) | cpu使用率(%) | 内存使用(M) |
| ----------------------------- | ----- | ------------------------------- | :------------------: | :---: | ------------ | ------------ | ---------------------- | ------------ | ------------- | --------------------- | -------------- | ------------- |
| 200万 | 1-3车 | ascend310 | 1(host:18,device:46) | 4 | 22 | 75(+-20) | 1656 | 20.388*2 | 3.649 | 4237787/2 | 250 | 1585 |
| | | | 2 | | 22*2 | 80 | 1761*2 | 23.026*2 | 4.384 | 7504560/2 | 225*2 | 1602+1341 |
| | | 跑到1000多张崩了 | 3 | | 22*3 | 115 | 1545*3 | 29.831*2 | 6.336 | 8688947/2 | 220*3 | 1484*3 |
| 上面的统计有误时间统一要乘以2 | | | | | | | | | | | | |
| 20190816 | | main中sendData改为200张循环一次 | 1 | 4 | 35-45 | 130 | 1000(参考,内存会上升) | 23.736 | 3.649 | 3640040 | 60 | 517 |
| | | | 2 | 4 | 33*2 | | 1000*2 | 35.256 | 5.512 | 4901293 | | 522 |
| | | | 3 | 4 | 28*3 | | 1000 | 52.755 | 6.411 | 4913278 | | |
| 20190817 | | 发4张返回4张结果 | 8 | | (5-10)*8 | 40-120 | 246*8 | 135.867 | 3.6 | 5087328 | (5-10)*8 | 513*8 |
| 20190820 | | 不返回decodeBuf | 8 | | | | | 103.608 | | 6671299 | | |
| 20190823 | | 8k内存池 | 5 | 4 | | (5-10)*8 | 639 | 64.446 | | 6703286 | | |
| | | | 1 | 4 | | | | 27.972 | | 3088803 | | |
| | | | 8 | 4 | | | 246 | 93.261 | | 7411458 | | |
| | | | 1 | 8 | | | | 26.217 | | 3295571 | | |
| | | 4k内存池、内存管理 | 7 | 8 | | | 433 | 78.433 | | 7711040 | | 531 |
| | | fp16 | 5 | 16 | | | | 44.115 | | 9792657 | | 519 |
| | | | | | | | | | | | | |
| | | int8 | 6 | 16 | | | 574 | 43.941 | | 11797571 | | 519 |
| | | int8 | 6 | 16 | | | 559.8 | 42.667 | | 12149816 | | 527 |
| 500万 | | | | | | | | | | | | |
| 700万 | 5车 | int8 | 6 | 16 | | | 569 | 47.719 | | 10863617 | | 541 |
| 网络 | 耗时(batch=1)(3.4GB) | 耗时(Batch=4) |
| ----------------------------- | -------------------- | ------------- |
| ObjectDetectStageCentG320x384 | 18.99000 | |
| ObjectDetectStageV160x128 | 4.001000 | |
| VehiclePlateSegmentCH | 0.879000 | |
| VehicleDriverGeneral | 3.446000 | |
| VehiclePlateNo | 2.013000 | |
| VehiclePlateAlignCH | 1.655000 | |
| VehiclePlateNameCH | 1.576000 | |
| VehiclePlateExceptionPlate | 1.012000 | |
| VehiclePlateExceptionHead | 0.895000 | |
| VehicleLabel | 2.778000 | |
| VehicleColor | 1.679000 | |
| VehicleType | 1.083000 | |
| ObjectDetectStageT160x96 | 1.498000 | |
| Graph整体运行时间 | 60ms | |
以上是关于markdown [20190806] Atlas300性能实验的主要内容,如果未能解决你的问题,请参考以下文章