markdown [20190806] Atlas300性能实验

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了markdown [20190806] Atlas300性能实验相关的知识,希望对你有一定的参考价值。

| 模块(单进程)                         | batch=4耗时           | Batch=8耗时 | Batch=16耗时 |
| -------------------------------------- | --------------------- | ----------- | ------------ |
| DvppJpegDecode                         | 2.878*4               | 2.873*8     | 2.859        |
| ObjectDetectStage1_v3_Input            | 0.028                 | 0.036       | 0.036        |
| ObjectDetectStage1_v3_PreProcess       | 6.018                 | 11.945      | 11.985       |
| ObjectDetectStage1_v3_Predict          | 24.847                | 48.648      | 97.002       |
| ObjectDetectStage1_v3_GetLayer         | 0.743                 | 0.77        | 0.771        |
| ObjectDetectStage1_v3_PostProcess      | 0.664*4               | 0.661*8     | 0.661        |
| ObjectDetectStage1_v3_Output           | 0.011*4               | 0.011*8     | 0.01         |
| VehicleDetectStage2_Input              | 0.028                 | 0.034       | 0.034        |
| VehicleDetectStage2_Index              | 0.005                 | 0.007       | 0.009        |
| VehicleDetectStage2_PreProcess         | 0.842                 | 1.563       | 2.643        |
| VehicleDetectStage2_Predict            | 4.516                 | 7.334       | 13.754       |
| VehicleDetectStage2_GetLayer           | 0.206                 | 0.201       | 0.188        |
| VehicleDetectStage2_PostProcess        | 0.172*4               | 0.163*8     | 0.159        |
| VehicleDetectStage2_Output             | 0.003*4               | 0.003*8     | 0.021        |
| VehicleDetectStage2_Segment_Index      | 0.005                 | 0.009       | 0.01         |
| VehicleDetectStage2_Segment_PreProcess | 0.354                 | 0.578       | 0.544        |
| VehicleDetectStage2_Segment_Predict    | 0.784                 | 0.947       | 1.02         |
| VehicleDetectStage2_Segment_Output     | 0.002                 | 0.002       | 0.002        |
| Vehicle_Input                          | 0.023                 | 0.027       | 0.027        |
| Vehicle_Index                          | 0.002                 | 0.005       | 0.006        |
| Vehicle_PreProcess                     | 0.759                 | 1.526       | 2.741        |
| Vehicle_Predict                        | 1.555                 | 2.247       | 3.327        |
| Vehicle_Classification                 | 0.011*4               | 0.02*8      | 0.021        |
| Vehicle_ExtractFeature                 | 0.008*4               | 0.016*8     | 0.018        |
| Vehicle_FeatureCode                    | 0.487                 | 0.996       | 1.084        |
| VehicleDriver_Input                    | 0.025                 | 0.031       | 0.028        |
| VehicleDriver_Index                    | 0.006                 | 0.01        | 0.012        |
| VehicleDriver_PreProcess               | 0.289                 | 0.354       | 0.323        |
| VehicleDriver_Predict                  | 4.38                  | 6.736       | 11.514       |
| VehicleDriver_ArgMax                   | 0.001*4               | 0.001*8     | 0.001        |
| VehicleDriver_Output                   | 0.001*4               | 0.001*8     | 0.001        |
| VehiclePlate_Input                     | 0.024                 | 0.03        | 0.027        |
| VehiclePlateAlign_Index                | 0.063                 | 0.066       | 0.07         |
| VehiclePlateAlign_PreProcess           | 0.644                 | 0.646       | 0.934        |
| VehiclePlateAlign_Predict              | 1.614                 | 2.774       | 6.691        |
| VehiclePlateAlign_Output               | 0.001*4               | 0.001*8     | 0.001        |
| VehiclePlate_ProcessPlate              | 9.818                 | 16.827      | 17.364       |
| VehiclePlate_Index                     | 5.442                 | 10.142      | 10.578       |
| VehiclePlate_CropToMat                 | 0.186                 | 0.152       | 0.154        |
| VehiclePlate_ImageRotate               | 1.21                  | 1.178       | 1.251        |
| VehiclePlate_PreProcess                | 0.05                  | 0.098       | 0.163        |
| VehiclePlate_Predict                   | 1.461                 | 2.154       | 3.594        |
| VehiclePlate_Output                    | 0.094                 | 0.172       | 0.279        |
| VehiclePlate_PostProcess               | 0.054                 | 0.098       | 0.1          |
| VehicleSpecial_Input                   | 0.019                 | 0.019       | 0.018        |
| VehicleSpecial_Index                   | 0.018                 | 0.026       | 0.027        |
| VehicleSpecial_PreProcess              | 1.127                 | 2.012       | 2.71         |
| VehicleSpecial_Predict                 | 2.053                 | 3.078       | 5.366        |
| VehicleSpecial_GetLayer                | 0.022                 | 0.02        | 0.019        |
| VehicleSpecial_PostProcess             | 0.039*4               | 0.035*8     | 0.035        |
| VehicleSpecial_Output                  | 0.001*4               | 0.001*8     | 0.001        |
| 其他(框架消耗?)                     | 111.888-76.774=35.114 |             |              |
|                                        |                       |             |              |
| 像素                          | 类型  | NPU                             |    进程数(线程数)    | Batch | ARM使用率(%) | NPU使用率(%) | NPU显存(M)             | 算法时间(ms) | 解码时间(ms) | 每天处理图片数量(张) | cpu使用率(%) | 内存使用(M) |
| ----------------------------- | ----- | ------------------------------- | :------------------: | :---: | ------------ | ------------ | ---------------------- | ------------ | ------------- | --------------------- | -------------- | ------------- |
| 200万                         | 1-3车 | ascend310                       | 1(host:18,device:46) |   4   | 22           | 75(+-20)     | 1656                   | 20.388*2     | 3.649         | 4237787/2             | 250            | 1585          |
|                               |       |                                 |          2           |       | 22*2         | 80           | 1761*2                 | 23.026*2     | 4.384         | 7504560/2             | 225*2          | 1602+1341     |
|                               |       | 跑到1000多张崩了                |          3           |       | 22*3         | 115          | 1545*3                 | 29.831*2     | 6.336         | 8688947/2             | 220*3          | 1484*3        |
| 上面的统计有误时间统一要乘以2 |       |                                 |                      |       |              |              |                        |              |               |                       |                |               |
| 20190816                      |       | main中sendData改为200张循环一次 |          1           |   4   | 35-45        | 130          | 1000(参考,内存会上升) | 23.736       | 3.649         | 3640040               | 60             | 517           |
|                               |       |                                 |          2           |   4   | 33*2         |              | 1000*2                 | 35.256       | 5.512         | 4901293               |                | 522           |
|                               |       |                                 |          3           |   4   | 28*3         |              | 1000                   | 52.755       | 6.411         | 4913278               |                |               |
| 20190817                      |       | 发4张返回4张结果                |          8           |       | (5-10)*8     | 40-120       | 246*8                  | 135.867      | 3.6           | 5087328               | (5-10)*8       | 513*8         |
| 20190820                      |       | 不返回decodeBuf                 |          8           |       |              |              |                        | 103.608      |               | 6671299               |                |               |
| 20190823                      |       | 8k内存池                        |          5           |   4   |              | (5-10)*8     | 639                    | 64.446       |               | 6703286               |                |               |
|                               |       |                                 |          1           |   4   |              |              |                        | 27.972       |               | 3088803               |                |               |
|                               |       |                                 |          8           |   4   |              |              | 246                    | 93.261       |               | 7411458               |                |               |
|                               |       |                                 |          1           |   8   |              |              |                        | 26.217       |               | 3295571               |                |               |
|                               |       | 4k内存池、内存管理              |          7           |   8   |              |              | 433                    | 78.433       |               | 7711040               |                | 531           |
|                               |       | fp16                            |          5           |  16   |              |              |                        | 44.115       |               | 9792657               |                | 519           |
|                               |       |                                 |                      |       |              |              |                        |              |               |                       |                |               |
|                               |       | int8                            |          6           |  16   |              |              | 574                    | 43.941       |               | 11797571              |                | 519           |
|                               |       | int8                            |          6           |  16   |              |              | 559.8                  | 42.667       |               | 12149816              |                | 527           |
| 500万                         |       |                                 |                      |       |              |              |                        |              |               |                       |                |               |
| 700万                         | 5车   | int8                            |          6           |  16   |              |              | 569                    | 47.719       |               | 10863617              |                | 541           |
| 网络                          | 耗时(batch=1)(3.4GB) | 耗时(Batch=4) |
| ----------------------------- | -------------------- | ------------- |
| ObjectDetectStageCentG320x384 | 18.99000             |               |
| ObjectDetectStageV160x128     | 4.001000             |               |
| VehiclePlateSegmentCH         | 0.879000             |               |
| VehicleDriverGeneral          | 3.446000             |               |
| VehiclePlateNo                | 2.013000             |               |
| VehiclePlateAlignCH           | 1.655000             |               |
| VehiclePlateNameCH            | 1.576000             |               |
| VehiclePlateExceptionPlate    | 1.012000             |               |
| VehiclePlateExceptionHead     | 0.895000             |               |
| VehicleLabel                  | 2.778000             |               |
| VehicleColor                  | 1.679000             |               |
| VehicleType                   | 1.083000             |               |
| ObjectDetectStageT160x96      | 1.498000             |               |
| Graph整体运行时间             | 60ms                 |               |

以上是关于markdown [20190806] Atlas300性能实验的主要内容,如果未能解决你的问题,请参考以下文章

markdown Atlas300实验:模型验证caffe_npu推理误差

Numpy函数(20190806)

Markdown 替代品 Asciidoc 介绍

数据治理:编译Atlas安装包

数据治理:Atlas搭建启动

Atlas部署并集成HIVE