pyinstaller打包机器学习库若干坑
Posted Q博士
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了pyinstaller打包机器学习库若干坑相关的知识,希望对你有一定的参考价值。
背景
之前调研的pyinstaller
打包bin
的方案进入落地阶段,之前调研文章见利用pyinstaller打包python项目发布到线上。之前实验的对象是个很简单的web服务,没有过多的依赖其他包,这次落地的项目里面使用了很多的机器学习库,所以落地过程中还是稍显麻烦。
问题
- pyd文件引入问题
- .so文件引入问题
- multiprocessing和pyinstaller冲突问题
下面一一来说
pyd文件引入问题
pipenv run pyinstaller -F main.py -n scscore
打包成功后,生成了一个spec文件,执行程序,报错
[doctorq@gz-inf-development01 scscore]$ ./dist/scscore
/tmp/_MEINtWbir/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
Traceback (most recent call last):
File "main.py", line 8, in <module>
from src.route import load_route
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "src/route.py", line 6, in <module>
from src.view.forecast_view.feature_importance_view import FeatureImportanceView
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "src/view/forecast_view/feature_importance_view.py", line 7, in <module>
from src.importance.feature_importance import FeatureImportance
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "src/importance/feature_importance.py", line 7, in <module>
from src.forecasting.trainer import Trainer
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "src/forecasting/trainer.py", line 13, in <module>
from src.Models.collect_models import ModelCollector
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "src/Models/collect_models.py", line 7, in <module>
from src.Models.statistic_model.ARIMA import ARIMA
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "src/Models/statistic_model/ARIMA.py", line 2, in <module>
from pmdarima.arima import auto_arima
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "site-packages/pmdarima/__init__.py", line 29, in <module>
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "site-packages/pmdarima/arima/__init__.py", line 6, in <module>
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "site-packages/pmdarima/arima/arima.py", line 10, in <module>
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "site-packages/sklearn/metrics/__init__.py", line 36, in <module>
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "site-packages/sklearn/metrics/cluster/__init__.py", line 20, in <module>
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "site-packages/sklearn/metrics/cluster/unsupervised.py", line 16, in <module>
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "site-packages/sklearn/metrics/pairwise.py", line 32, in <module>
File "sklearn/metrics/pairwise_fast.pyx", line 1, in init sklearn.metrics.pairwise_fast
ModuleNotFoundError: No module named 'sklearn.utils._cython_blas'
[6625] Failed to execute script main
这些文件是c/c++编译成的python库,供python调用,需要额外处理,处理逻辑就是把这些库按个加到scscore.spec文件中的hiddenimports属性中,我是把各个库下面的里的cpython
关键字的文件都加上了
[doctorq@gz-inf-development01 utils]$ ll|grep cpython
-rwxrwxr-x 1 doctorq doctorq 221256 7月 16 15:41 arrayfuncs.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 426280 7月 16 15:41 _cython_blas.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 238344 7月 16 15:41 fast_dict.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 95824 7月 16 15:41 graph_shortest_path.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 28512 7月 16 15:41 lgamma.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 179592 7月 16 15:41 _logistic_sigmoid.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 88856 7月 16 15:41 murmurhash.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 99864 7月 16 15:41 _random.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 140992 7月 16 15:41 seq_dataset.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 643648 7月 16 15:41 sparsefuncs_fast.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 62880 7月 16 15:41 weight_vector.cpython-37m-x86_64-linux-gnu.so
添加后的spec文件如下:
hiddenimports=['cython','sklearn','sklearn.utils._cython_blas','statsmodels','statsmodels.tsa'
'statsmodels.tsa.statespace._kalman_smoother',
'statsmodels.tsa.statespace._representation',
'statsmodels.tsa.statespace._simulation_smoother',
'statsmodels.tsa.statespace._statespace',
'statsmodels.tsa.statespace._tools',
'statsmodels.tsa.statespace._filters._conventional',
'statsmodels.tsa.statespace._filters._inversions',
'statsmodels.tsa.statespace._filters._univariate',
'statsmodels.tsa.statespace._smoothers._alternative',
'statsmodels.tsa.statespace._smoothers._classical',
'statsmodels.tsa.statespace._smoothers._conventional',
'statsmodels.tsa.statespace._smoothers._univariate',
'sklearn.neighbors.typedefs',
'sklearn.neighbors.quad_tree',
'sklearn.neighbors.ball_tree',
'sklearn.neighbors.dist_metrics',
'sklearn.neighbors.kd_tree',
'sklearn.tree._utils',
'sklearn.tree._criterion',
'sklearn.tree._splitter',
'sklearn.tree._utils',
然后我们再编译,所依赖的这种类型的库,都集成进去了。
> pipenv run pyinstaller scscore.spec # 从spec文件安装
> dist/scscore
File "site-packages/xgboost/__init__.py", line 11, in <module>
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
exec(bytecode, module.__dict__)
File "site-packages/xgboost/core.py", line 161, in <module>
File "site-packages/xgboost/core.py", line 123, in _load_lib
File "site-packages/xgboost/libpath.py", line 48, in find_lib_path
xgboost.libpath.XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
List of candidates:
/tmp/_MEIdNDrR6/xgboost/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/../../lib/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/./lib/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/libxgboost.so
[32970] Failed to execute script main
.so文件引入问题
上面的报错主要都是xgboost的动态连接库的问题,该问题解决方法就是在$pipenv --venv/lib/python3.7/site-packages/PyInstaller/hooks
下新增一个文件hook-xgboost.py
,文件名严格要求,文件内容如下:
from PyInstaller.utils.hooks import collect_all
datas, binaries, hiddenimports = collect_all("xgboost")
然后再运行打包
> pipenv run pyinstaller --clean scscore.spec
> ./dist/scscore
然后执行会出现如下情况,一直在启动,不能停~~
出现这个问题是因为joblib库的一个bug,见文章Pyinstaller exe keeps opening itself,只需要把joblib降级到0.11就行了。
> pipenv install joblib==0.11
> pipenv run pyinstaller --clean scscore.spec
> ./dist/scscore
搞定
通过以下配置将程序临时文件存到其他地方,防止打爆/tmp文件
runtime_tmpdir='/home/doctorq/python-dev/scscore/tmp',
以上是关于pyinstaller打包机器学习库若干坑的主要内容,如果未能解决你的问题,请参考以下文章
python3 PyQt5 pyinstaller 打包太大,打包完40M。引用的库写在下面,求大神看怎再能精简一下!!