如何更改 tensorflow 的 numpy 数组的 dtypes

Posted 2023-03-11

技术标签:

【中文标题】如何更改 tensorflow 的 numpy 数组的 dtypes【英文标题】：How to change dtypes of numpy array for tensorflow 【发布时间】：2018-04-06 12:22:53 【问题描述】：

我在 tensorflow 中创建了一个神经网络，并创建了这样的占位符：

input_tensor = tf.placeholder(tf.float32, shape = (None,n_input), name = "input_tensor")
output_tensor = tf.placeholder(tf.float32, shape = (None,n_classes), name = "output_tensor")

在训练过程中，我收到以下错误：

Traceback (most recent call last):
  File "try.py", line 150, in <module>
    sess.run(optimizer, feed_dict=X: x_train[i: i + 1], Y: y_train[i: i + 1])
TypeError: unhashable type: 'numpy.ndarray'

我发现这是因为我的 x_train 和 y_train 的数据类型与占位符的数据类型不同。

我的 x_train 看起来有点像这样：

array([[array([[ 1.,  0.,  0.],
   [ 0.,  1.,  0.]])],
   [array([[ 0.,  1.,  0.],
   [ 1.,  0.,  0.]])],
   [array([[ 0.,  0.,  1.],
   [ 0.,  1.,  0.]])]], dtype=object)

最初是这样的数据框：

0  [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
1  [[0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]
2  [[0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]

我做了x_train = train_x.values 来获取 numpy 数组

而 y_train 看起来是这样的：

array([[ 1.,  0.,  0.],
   [ 0.,  1.,  0.],
   [ 0.,  0.,  1.]])

x_train 有 dtype 对象，y_train 有 dtype float64。

我想知道的是如何更改我的训练数据的数据类型，以便它可以很好地与 tensorflow 占位符一起使用。或者如果我遗漏了什么，请提出建议。

【问题讨论】：

@BradSolomon 那只是因为我没有粘贴整个打印输出。我会编辑它。 @BradSolomon 它是单列，不用于索引。我之前误会了你。 【参考方案1】：

您的x_train 是一个包含数组的嵌套对象，因此您必须将其解包并重塑它。这是一个通用的 hack：

def unpack(a, aggregate=[]):
    for x in a:
        if type(x) is float:
            aggregate.append(x)
        else:
            unpack(x, aggregate=aggregate)
    return np.array(aggregate)
x_train = unpack(x_train.values).reshape(x_train.shape[0],-1)

一旦你得到一个密集数组（y_train 已经密集），你可以使用如下函数：

def cast(placeholder, array):
    dtype = placeholder.dtype.as_numpy_dtype
    return array.astype(dtype)

x_train, y_train = cast(X,x_train), cast(Y,y_train)

【讨论】：

如果我执行 x_train = np.array(list(x_train))，我得到 [0]。它应该是列表（xtrain.values）。然后我将以这样的方式结束： [[array([[ 1., 0., 0.], [ 0., 1., 0.]])] [array([[ 0., 1., 0 .], [ 1., 0., 0.]])] [array([[ 0., 0., 1.], [ 0., 1., 0.]])]] 再次带有 dtype 对象。如果我继续它，它会给我： sess.run(optimizer, feed_dict=X: x_train[i: i + 1], Y: y_train[i: i + 1]) TypeError: unhashable type: 'numpy. ndarray' 如果你使用as_matrix呢？执行 x_train = train_x.as_matrix() 得到的结果与我在问题中提到的 train_x.values 相同。已编辑以包含一个应该做你想做的事情的黑客，在一个嵌套的数据帧上测试它，就像你展示的那样，但它应该适用于几乎任何，只要索引数据点的行匹配是的。【参考方案2】：

很难猜出您希望数据是什么形状，但我猜测您可能正在寻找的两种组合之一。我还将尝试在 Pandas 数据框中模拟您的数据。

df = pd.DataFrame([[[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]], 
[[[0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]],
[[[0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]]], columns = ['Mydata'])
print(df)

x = df.Mydata.values
print(x.shape)
print(x)
print(x.dtype)

输出：

                               Mydata
0  [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
1  [[0.0, 1.0, 0.0], [1.0, 0.0, 0.0]]
2  [[0.0, 0.0, 1.0], [0.0, 1.0, 0.0]]

(3,)
[list([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]])
 list([[0.0, 1.0, 0.0], [1.0, 0.0, 0.0]])
 list([[0.0, 0.0, 1.0], [0.0, 1.0, 0.0]])]
object

组合1

y = [item for sub_list in x for item in sub_list]
y = np.array(y, dtype = np.float32)
print(y.dtype, y.shape)
print(y)

输出：

float32 (6, 3)
[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]

组合2

y = [sub_list for sub_list in x]
y = np.array(y, dtype = np.float32)
print(y.dtype, y.shape)
print(y)

输出：

float32 (3, 2, 3)
[[[ 1.  0.  0.]
  [ 0.  1.  0.]]

 [[ 0.  1.  0.]
  [ 1.  0.  0.]]

 [[ 0.  0.  1.]
  [ 0.  1.  0.]]]

【讨论】：

根据您的组合 2，如果我执行 y = np.array(y, dtype = np.float32)，我将得到“使用序列设置数组元素”。做 np.array(y) 给出的结构与我在问题中提到的相同。挑战在于我无法仅通过使用 dtype = np.float32 来更改 dtype，因为我遇到了同样的错误

以上是关于如何更改 tensorflow 的 numpy 数组的 dtypes的主要内容，如果未能解决你的问题，请参考以下文章