如何从 pandas 创建与 tf.data.experimental.make_csv_dataset 相同的结构
Posted
技术标签:
【中文标题】如何从 pandas 创建与 tf.data.experimental.make_csv_dataset 相同的结构【英文标题】:How to create the same structure of tf.data.experimental.make_csv_dataset from pandas 【发布时间】:2021-12-15 01:19:58 【问题描述】:tf.data.experimental.make_csv_dataset
创建一个 TF 数据集,为 Kears 监督训练做好准备。
titanic_file = tf.keras.utils.get_file("titanic_train.csv", "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
titanic = tf.data.experimental.make_csv_dataset(
titanic_file,
label_name="survived",
batch_size=1, # To compre with the head of CSV
shuffle=False, # To compre with the head of CSV
header=True,
)
for row in titanic.take(1): # Take the first batch
features = row[0] # Diectionary
label = row[1]
for feature, value in features.items():
print(f"feature:20s: value")
print(f"label/survived : label")
-----
sex : [b'male']
age : [22.]
n_siblings_spouses : [1]
parch : [0]
fare : [7.25]
class : [b'Third']
deck : [b'unknown']
embark_town : [b'Southampton']
alone : [b'n']
label/survived : [0]
如何从 Pandas 创建相同的内容?在下面尝试,但标签是字典而不是 int32。
df = pd.read_csv(titanic_file)
titanic_from_pandas = tf.data.Dataset.from_tensor_slices((
dict(df.loc[:, df.columns != 'survived']),
dict(df.loc[:, ['survived']])
))
for row in titanic_from_pandas.batch(1).take(1): # Take the first batch
features = row[0] # Diectionary
label = row[1]
for feature, value in features.items():
print(f"feature:20s: value")
print(f"label/survived : label")
---
sex : [b'male']
age : [22.]
n_siblings_spouses : [1]
parch : [0]
fare : [7.25]
class : [b'Third']
deck : [b'unknown']
embark_town : [b'Southampton']
alone : [b'n']
label/survived : 'survived': <tf.Tensor: shape=(1,), dtype=int64, numpy=array([0])> <-----
顺便说一下,为 Keras 监督训练准备的数据结构是(特征、标签),但是哪个文档定义了它?
【问题讨论】:
只需df['survived']
。你清楚地将dict传递给tf.data.Dataset.from_tensor_slices
,所以你得到了dict,我不明白问题出在哪里:P
tensorflow.org/api_docs/python/tf/keras/Model#fit 定义了应该传递给.fit()
的内容
【参考方案1】:
正如@Proko 建议的那样。
titanic_from_pandas = tf.data.Dataset.from_tensor_slices((
dict(df.loc[:, df.columns != 'survived']),
df.loc[:, 'survived']
))
for row in titanic_from_pandas.batch(1).take(1): # Take the first batch
features = row[0] # Diectionary
label = row[1]
for feature, value in features.items():
print(f"feature:20s: value")
print(f"label/survived : label")
---
sex : [b'male']
age : [22.]
n_siblings_spouses : [1]
parch : [0]
fare : [7.25]
class : [b'Third']
deck : [b'unknown']
embark_town : [b'Southampton']
alone : [b'n']
label/survived : [0]
【讨论】:
以上是关于如何从 pandas 创建与 tf.data.experimental.make_csv_dataset 相同的结构的主要内容,如果未能解决你的问题,请参考以下文章
如何从带有列表的嵌套 Json 创建 pandas DataFrame
如何从 pandas groupby().sum() 的输出创建一个新列?