TensorFlow 在训练期间没有使用我的 M1 MacBook GPU

Posted 2023-02-16

技术标签:

【中文标题】TensorFlow 在训练期间没有使用我的 M1 MacBook GPU【英文标题】：TensorFlow is not using my M1 MacBook GPU during training 【发布时间】：2021-07-24 21:58:17 【问题描述】：

我已经安装了 tensorflow-macos 并且在训练时这是我的 CPU 使用率和 GPU 使用率。

我可以让 Tensorflow 在 GPU 上运行吗？

【问题讨论】：

这是一个有用的线程：github.com/pytorch/pytorch/issues/47702#issuecomment-948858262 它不涉及 Tensorflow，但涉及 PyTorch，但在此阶段仍有助于了解 M1 的 GPU 对深度学习的期望。这可能会有所帮助！ ***.com/questions/70354859/… 【参考方案1】：

你可以，但现在看来有点痛苦。一种解决方案是使用迷你锻造。如果您使用 conda，则需要先将其卸载。

安装 Xcode 和命令行工具包。安装 Miniforge 以获取 conda。在 conda 环境和其他必需的软件包中从 conda-forge 安装 Apple 的 TensorFlow 分支。

我的回答是基于这个有用的指南： https://medium.com/gft-engineering/macbook-m1-tensorflow-on-jupyter-notebooks-6171e1f48060

Apple 的 GitHub 上的这个 issue 有更多的讨论： https://github.com/apple/tensorflow_macos/issues/153

【讨论】：

我确实安装了 miniforge 和 Apple 版本的 TensorFlow，版本是 '2.4.0-rc0'。但是 TensorFlow 仍然在 CPU 上运行 :(【参考方案2】：

我目前也面临同样的问题。我确实尝试过关注这个youtube link。仍然按照我的编译器在 make -j8 处失败的步骤进行操作，这也很令人沮丧。希望也有解决方案。

更新于 6 月 21 日 16 日

能够使用 opencv2 和 tensorflow2.4 启动我的测试环境。按照Prabhat on medium的步骤操作。

注意：小心弄乱 conda 和 pip 环境，并更改添加/下载 opncv 和 tensorflow 虚拟环境的默认路径。

希望这对安装有帮助。

对于测试运行，我还使用了github tf test-code

【讨论】：

【参考方案3】：

您可以尝试运行以下示例代码，打开活动监视器以检查 gpu 是否正常工作以及 Tensorflow 是否安装完美。

#import os
#os.environ["TF_DISABLE_MLC"] = "1"
#os.environ["TF_MLC_LOGGING"] = "1"
import tensorflow as tf
from tensorflow.python.compiler.mlcompute import mlcompute

tf.compat.v1.disable_eager_execution()
mlcompute.set_mlc_device(device_name='gpu')
print("is_apple_mlc_enabled %s" % mlcompute.is_apple_mlc_enabled())
print("is_tf_compiled_with_apple_mlc %s" % mlcompute.is_tf_compiled_with_apple_mlc())
print(f"eagerly? tf.executing_eagerly()")
print(tf.config.list_logical_devices())

from tensorflow.keras import datasets, layers, models

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
                    validation_data=(test_images, test_labels))

【讨论】：

【参考方案4】：

我今天一直在设置我的新 M1 机器，并正在寻找类似 @987654321@ 提供的测试已经在这里。在遵循#153 中提供的标准说明后，它使用使用 Homebrew 安装的 miniforge 包管理器和从 #153 指南中的 YAML 文件克隆的环境成功地在 GPU 上运行。

我还运行了更小更简单的 sn-p，它只在 CPU 上运行，'% GPU' == 0%：

import numpy as np
import tensorflow as tf

### Aman's code to enable the GPU
#from tensorflow.python.compiler.mlcompute import mlcompute
#tf.compat.v1.disable_eager_execution()
#mlcompute.set_mlc_device(device_name='gpu')
#print("is_apple_mlc_enabled %s" % mlcompute.is_apple_mlc_enabled())
#print("is_tf_compiled_with_apple_mlc %s" % #mlcompute.is_tf_compiled_with_apple_mlc())
#print(f"eagerly? tf.executing_eagerly()")
#print(tf.config.list_logical_devices())

x = np.random.random((10000, 5))
y = np.random.random((10000, 2))

x2 = np.random.random((2000, 5))
y2 = np.random.random((2000, 2))

inp = tf.keras.layers.Input(shape = (5,))
l1 = tf.keras.layers.Dense(256, activation = 'sigmoid')(inp)
l1 = tf.keras.layers.Dense(256, activation = 'sigmoid')(l1)
l1 = tf.keras.layers.Dense(256, activation = 'sigmoid')(l1)
l1 = tf.keras.layers.Dense(256, activation = 'sigmoid')(l1)
l1 = tf.keras.layers.Dense(256, activation = 'sigmoid')(l1)
o = tf.keras.layers.Dense(2, activation = 'sigmoid')(l1)

model = tf.keras.models.Model(inputs = [inp], outputs = [o])
model.compile(optimizer = "Adam", loss = "mse")

model.fit(x, y, validation_data = (x2, y2), batch_size = 500, epochs = 500)

取消注释从 Aman 的代码中添加的行并重新运行会使 GPU 再次工作：

如果这些脚本仍然没有根据活动监视器使用 GPU（在 view/update_frequency 中将更新速率设置为 1 秒），请返回 #153 页面重新开始并仔细按照说明进行操作，并确保忽略针对 Intel/X86 的说明。

我的步骤：

安装 xcode（从应用商店） install Homebrew（不要忘记在安装完成后按照建议设置 PATH，然后需要重新启动终端或重新加载您的 shell 配置文件）安装 miniforge（“brew install miniforge”）复制 environment.yaml 文件并使用 #153 中给出的命令克隆为新的 conda 环境。利润。

【讨论】：

【参考方案5】：

此问题已在 TensorFlow-macos 2.5 版本中得到修复。在 Mac M1 上将 GPU 用于 TensorFlow 的最简单方法是创建一个新的 conda miniforge3 ARM64 环境并运行以下 3 个命令来安装 TensorFlow 及其依赖项：

conda install -c apple tensorflow-deps
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal

此页面上有更多说明：https://developer.apple.com/metal/tensorflow-plugin/

“使用TensorFlow加速机器学习模型的训练权在你的 Mac 上。安装 TensorFlow v2.5 和 tensorflow-metal PluggableDevice 可加速 Mac GPU 上的 Metal 训练。”

【讨论】：

【参考方案6】：

您可以通过

查看可用的 GPU 设备

import tensorflow as tf
tf.config.list_physical_devices()

然后运行你的模型

with tf.device('/device:GPU:0'):
    model.fit(x_train, y_train)

另见https://www.tensorflow.org/api_docs/python/tf/device

【讨论】：

以上是关于TensorFlow 在训练期间没有使用我的 M1 MacBook GPU的主要内容，如果未能解决你的问题，请参考以下文章