我在 tf.contrib.learn.LinearClassifier.fit 中作为 x 和 y 参数传递啥
Posted
技术标签:
【中文标题】我在 tf.contrib.learn.LinearClassifier.fit 中作为 x 和 y 参数传递啥【英文标题】:What do I pass as the x and y parameters in tf.contrib.learn.LinearClassifier.fit我在 tf.contrib.learn.LinearClassifier.fit 中作为 x 和 y 参数传递什么 【发布时间】:2016-11-09 13:16:14 【问题描述】:问题
我已经设置了TensorFlow linear classifier tutorial 的玩具示例。在这个例子中,fit
方法是使用参数input_fn
调用的,我在其中传递了train_input_fn
。这就是 TensorFlow 喜欢传递数据的方式。但是,我真的很想运行小批量。幸运的是,fit
有一个batch_size
参数,但我需要放弃使用input_fn
并改为传递x
和y
。我试过传递ndarray
s 和DataFrames
以及train_input_fn
函数的输出。没有任何效果。我需要一个使用batch_size
参数的工作示例。
设置
这是设置代码,分为我没有问题的内容,然后是问题部分。
没问题(随意复制/粘贴/运行)
import pandas as pd
import numpy as np
import tensorflow as tf
import tempfile
np.random.seed([3,1415])
df = pd.DataFrame(dict(cat1=np.random.choice(('Yes', 'No'), (100,),),
val1=np.random.rand(100),
val2=np.random.rand(100),
val3=np.random.rand(100),
label=np.random.choice((0, 1), (100,))))
LABEL_COLUMN = "label"
trainBegin, trainEnd = 0, 80
testBegin, testEnd = 80, 100
df_train = df.iloc[trainBegin:trainEnd, :]
df_test = df.iloc[testBegin:testEnd, :]
CONTINUOUS_COLUMNS = ['val1', 'val2', 'val3']
CATEGORICAL_COLUMNS = ['cat1']
def input_fn(df):
# Creates a dictionary mapping from each continuous feature column name (k) to
# the values of that column stored in a constant Tensor.
continuous_cols = k: tf.constant(df[k].values)
for k in CONTINUOUS_COLUMNS
# Creates a dictionary mapping from each categorical feature column name (k)
# to the values of that column stored in a tf.SparseTensor.
categorical_cols = k: tf.SparseTensor(
indices=[[i, 0] for i in range(df[k].size)],
values=df[k].values,
shape=[df[k].size, 1])
for k in CATEGORICAL_COLUMNS
# Merges the two dictionaries into one.
feature_cols = dict(continuous_cols.items() + categorical_cols.items())
# Converts the label column into a constant Tensor.
label = tf.constant(df[LABEL_COLUMN].values)
# Returns the feature columns and the label.
return feature_cols, label
def train_input_fn():
return input_fn(df_train)
def eval_input_fn():
return input_fn(df_test)
val1 = tf.contrib.layers.real_valued_column("val1")
val2 = tf.contrib.layers.real_valued_column("val2")
val3 = tf.contrib.layers.real_valued_column("val3")
wide_columns = [val1, val2, val3]
问题部分工作版本
model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.LinearClassifier(feature_columns=wide_columns, model_dir=model_dir)
m.fit(input_fn=train_input_fn, steps=200)
results = m.evaluate(input_fn=eval_input_fn, steps=1)
for key in sorted(results):
print("%s: %s" % (key, results[key]))
accuracy: 0.45
eval_auc: 0.459596
loss: 0.771354
问题部分非工作版本
model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.LinearClassifier(feature_columns=wide_columns, model_dir=model_dir)
m.fit(input_fn=train_input_fn, steps=200)
# 2 lines that are different ##########################
x, y = train_input_fn()
results = m.evaluate(x=x, y=y, batch_size=100, steps=1)
#######################################################
for key in sorted(results):
print("%s: %s" % (key, results[key]))
以下是我得到的错误,但根据我的尝试,我得到不同的错误。文档说一个矩阵。我也试过了。
整个回溯的转储
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-135-5b53add19aac> in <module>()
12 # p.fit(input_fn=train_input_fn, steps=10)
13 x, y = train_input_fn()
---> 14 p.fit(x=df_train, y=df_train, steps=10, batch_size=100)
15 results = p.evaluate(input_fn=eval_input_fn, steps=1)
16 for key in sorted(results):
/Users/sean/anaconda/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.pyc in fit(self, x, y, input_fn, steps, batch_size, monitors)
171 if x is None:
172 raise ValueError('Either x or input_fn must be provided.')
--> 173 input_fn, feed_fn = _get_input_fn(x, y, batch_size)
174 elif (x is not None) or (y is not None):
175 raise ValueError('Can not provide both input_fn and either of x and y.')
/Users/sean/anaconda/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.pyc in _get_input_fn(x, y, batch_size)
65 def _get_input_fn(x, y, batch_size):
66 df = data_feeder.setup_train_data_feeder(
---> 67 x, y, n_classes=None, batch_size=batch_size)
68 return df.input_builder, df.get_feed_dict_fn()
69
/Users/sean/anaconda/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/io/data_feeder.pyc in setup_train_data_feeder(X, y, n_classes, batch_size, shuffle, epochs)
97 ValueError: if one of `X` and `y` is iterable and the other is not.
98 """
---> 99 X, y = _data_type_filter(X, y)
100 if HAS_DASK:
101 # pylint: disable=g-import-not-at-top
/Users/sean/anaconda/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/io/data_feeder.pyc in _data_type_filter(X, y)
65 y = extract_dask_labels(y)
66 if HAS_PANDAS:
---> 67 X = extract_pandas_data(X)
68 if y is not None:
69 y = extract_pandas_labels(y)
/Users/sean/anaconda/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/io/pandas_io.pyc in extract_pandas_data(data)
51 return data.values.astype('float')
52 else:
---> 53 raise ValueError('Data types for data must be int, float, or bool.')
54
55
ValueError: Data types for data must be int, float, or bool.
【问题讨论】:
【参考方案1】:看来x
和y
传递的格式和input_fn
不一样。引用fit
的docstring:
x:形状为 [n_samples, n_features...] 的矩阵或张量。可以是返回特征数组的迭代器。用于拟合模型的训练输入样本。如果设置,input_fn 必须为 None。
下面的例子有效。请注意
我不得不用布尔值替换'Yes'
/'No'
(这可能不等价,但说明了这一点),因为似乎无法以这种方式输入稀疏数据。
我使用infer_real_valued_columns_from_input
获取列。
修订版:
import pandas as pd
import numpy as np
import tensorflow as tf
import tempfile
np.random.seed([3,1415])
_x_df = pd.DataFrame(dict(
cat1=np.random.choice((True, False), (100,),),
val1=np.random.rand(100),
val2=np.random.rand(100),
val3=np.random.rand(100)))
_y_df = pd.DataFrame(dict(label=np.random.choice((0, 1), (100,))))
trainBegin, trainEnd = 0, 80
testBegin, testEnd = 80, 100
x_df_train = _x_df.iloc[trainBegin:trainEnd, :]
x_df_test = _x_df.iloc[testBegin:testEnd, :]
y_df_train = _y_df.iloc[trainBegin:trainEnd, :]
y_df_test = _y_df.iloc[testBegin:testEnd, :]
wide_columns = tf.contrib.learn.infer_real_valued_columns_from_input(x_df_train)
model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.LinearClassifier(feature_columns=wide_columns, model_dir=model_dir)
m.fit(x_df_train, y_df_train, batch_size=5, steps=200)
results = m.evaluate(x_df_test, y_df_test, batch_size=5, steps=1)
for key in sorted(results):
print("%s: %s" % (key, results[key]))
【讨论】:
以上是关于我在 tf.contrib.learn.LinearClassifier.fit 中作为 x 和 y 参数传递啥的主要内容,如果未能解决你的问题,请参考以下文章
我在 Visual Studio 的列中看不到变量的附加值,但我在 SQL 服务器中看到了它们。为啥?
我在 Kapt Debug Kotlin 中遇到错误。我在 gradle 文件中更新了依赖项的版本。仍然面临这个问题
每当我在颤动中运行我的程序时,我在控制台中收到此错误任何解决方案?
当我在 ngOnInit() 中使用 router.getCurrentNavigation() 时,它会给我类型错误,但是当我在构造函数中使用它时,它工作正常,为啥?