让TensorFlow在Macbook M1上性能翻倍

Posted 2021-09-26 nxlhero

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了让TensorFlow在Macbook M1上性能翻倍相关的知识，希望对你有一定的参考价值。

手头有台MacBook M1笔记本，大部分应用都不兼容，VMware Fusion不支持Linux虚拟机。Parallel据说支持arm版的Windows和Linux，但是好像也不好用。唯一还有点用的地方就是做机器学习，目前tensorflow2.5原生支持M1，性能相比于2.4有较大提升，但是必须得用MacOS 12，还处于beta阶段。本文记录了在M1上配置tensorflow环境的过程，并且做了一些简单测试，从测试结果来看，性能提升还是比较明显的。

升级MacOS 12

目前苹果为适配M1开发的tensorflow版本已经不用了，tensorflow2.5原生支持M1，所以第一步是升级MacOS12，可以参考下面的教程。

https://zhuanlan.zhihu.com/p/378946858

配置Conda环境

因为Anaconda还不支持m1处理器，自带的python也是3.8的，不能原生支持arm处理器，所以需要使用开源的miniforge代替，它带了python3.9。

以下摘自miniforge的github的主页。

Miniforge3

Latest installers with Python 3.9 (*) in the base environment:

OS	Architecture	Download
Linux	x86_64 (amd64)	Miniforge3-Linux-x86_64
Linux	aarch64 (arm64) `(**)`	Miniforge3-Linux-aarch64
Linux	ppc64le (POWER8/9)	Miniforge3-Linux-ppc64le
OS X	x86_64	Miniforge3-MacOSX-x86_64
OS X	arm64 (Apple Silicon) `(***)`	Miniforge3-MacOSX-arm64
Windows	x86_64	Miniforge3-Windows-x86_64

(*) The Python version is specific only to the base environment. Conda can create new environments with different Python versions and implementations.

(**) While the Raspberry PI includes a 64 bit processor, the RasbianOS is built on a 32 bit kernel and is not a supported configuration for these installers. We recommend using a 64 bit linux distribution such as Ubuntu for Raspberry PI.

(***) Apple silicon builds are experimental and haven\'t had testing like the other platforms.

虽然conda对m1对支持还处于experimental阶段，但是python3.9是原生支持m1处理器的，我们只是用conda管理python的包。

在安装过程中，可能是因为之前安装了anaconda，遇到了conda被zsh kill的问题，试了好多方法，包括装了完整的xcode，都没解决问题，后来换了个安装路径解决了。理论上不需要安装xcode，直接安装miniforge就行。

https://github.com/conda-forge/miniforge/issues/190

安装很简单，只要下载了安装程序，直接执行即可。

 ./Miniforge3-MacOSX-arm64.sh

一路yes或者默认即可，安完之后重启终端，看看conda和python能否运行，我的运行结果是python3.9.6。

(base)  ~ % python
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:35:11) 
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

修改成国内仓库，打开或者创建~/.condarc，然后添加如下内容：

channels:
  - https://mirrors.ustc.edu.cn/anaconda/pkgs/main/
  - https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge/
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
  - defaults
show_channel_urls: true

安装一个包看是否用了国内源，可以看到，已经用了国内源

(base) niuxinli@niuxinlideMacBook-Pro ~ % conda install pandas
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /Users/niuxinli/miniforge3

  added / updated specs:
    - pandas

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    bottleneck-1.3.2           |   py39heec5a64_1          96 KB  https://mirrors.ustc.edu.cn/anaconda/pkgs/main
    ca-certificates-2021.7.5   |       hca03da5_1         113 KB

安装PyCharm

PyCharm支持M1处理器，下载PyCharm社区版即可。

给pycharm创建一个环境

安装TensorFlow

安装依赖

conda activate pycharm
conda install -c apple tensorflow-deps

用pip安装tensorflow

pip默认源太慢，临时用阿里的源

python -m pip install tensorflow-macos -i https://mirrors.aliyun.com/pypi/simple/

安装metal plugin

python -m pip install tensorflow-metal -i https://mirrors.aliyun.com/pypi/simple/

安装一些其他依赖

brew install libjpeg
pip install tensorflow-datasets -i https://mirrors.aliyun.com/pypi/simple/ 
conda install -y pandas matplotlib scikit-learn jupyterlab

安装完后，import numpy报错，

Original error was: dlopen(/Users/niuxinli/miniforge3/envs/pycharm/lib/python3.9/site-packages/numpy/core/_multiarray_umath.cpython-39-darwin.so, 0x0002): Library not loaded: @rpath/libcblas.3.dylib

查了一下，随便用安装opencv看看能解决吗，确实把import的报错解决了，不过有个错误，说tensorflow2.5与numpy1.21.2不兼容，先不管。

pip install opencv-python -i https://mirrors.aliyun.com/pypi/simple/

以下为安装时的报错

ERROR: pip\'s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-macos 2.5.0 requires numpy~=1.19.2, but you have numpy 1.21.2 which is incompatible.

从下面的运行来看这个报错没有影响tensorflow正常工作。

测试TensorFlow

为了对比m1下tensorflow的性能，我在网上找了一个博主写的对比结果和代码，链接如下：

https://zhuanlan.zhihu.com/p/350955566

他还是在mac os 11下安装的，理论上性能不如上面的安装方法。代码我稍微调整了一下兼容性相关的东西，其他的都不变。

import tensorflow as tf
import tensorflow_datasets as tfds
import time
from datetime import timedelta
from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

(ds_train, ds_test), ds_info = tfds.load(
    \'mnist\',
    split=[\'train\', \'test\'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
    return tf.cast(image, tf.float32) / 255., label

batch_size = 128

ds_train = ds_train.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits[\'train\'].num_examples)
ds_train = ds_train.batch(batch_size)
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

ds_test = ds_test.map(
normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(batch_size)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

model = tf.keras.models.Sequential([
 tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation=\'relu\'),
 tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation=\'relu\'),
 tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
 tf.keras.layers.Flatten(),
 tf.keras.layers.Dense(128, activation=\'relu\'),
 tf.keras.layers.Dense(10, activation=\'softmax\')
])

model.compile(
 loss=\'sparse_categorical_crossentropy\',
 optimizer=tf.keras.optimizers.Adam(0.001),
 metrics=[\'accuracy\'],
)

start = time.time()

model.fit(
 ds_train,
 epochs=10,
 # validation_steps=1,
 # steps_per_epoch=469,
 # validation_data=ds_test # 此处如果按原脚本添加这行，脚本无法运行，暂时未有解决方法
)

delta = (time.time() - start)
elapsed = str(timedelta(seconds=delta))
print(\'Elapsed Time: {}\'.format(elapsed))

运行的时候可以看到，GPU使用率接近100%

运行时间几乎稳定在1分32秒，比博主3分20秒的成绩提高了一半，接近Colab GPU。

因此，在m1上安装macos 12以及tensorflow 2.5，性能比之前接近翻倍。

以上是关于让TensorFlow在Macbook M1上性能翻倍的主要内容，如果未能解决你的问题，请参考以下文章