Tensorflow：使用浮点数作为标签创建训练数据的最佳方式

Posted 2023-04-18

技术标签:

【中文标题】Tensorflow：使用浮点数作为标签创建训练数据的最佳方式【英文标题】：Tensorflow: Best way of creating training data with float as label 【发布时间】：2020-10-11 21:01:48 【问题描述】：

我想在 python 3.7 中使用 tensorflow 版本 2.4.0-dev20201009。我的数据集位于子文件夹“data\Images”中。图像的标签是一个介于 1 和 5 之间的浮点数，可以从子文件夹“data”的 allTestData.csv 中读取。

在验证拆分为 30% 的情况下，读取数据的最佳方式是什么？到目前为止，我想使用 tf.keras.preprocessing.image_dataset_from_directory 但这并不能帮助我正确地合并标签，因为我的所有图像都在一个文件夹中，并且没有一次性编码向量作为标签。你会如何在 tensorflow 中做到这一点？

为了完整起见，我打算使用

def create_model():
  model = keras.Sequential()
  model.add(MobileNetV2(input_shape=(224, 224, 3), include_top=False))
  model.trainable = True
  model.add(layers.GlobalAveragePooling2D())
  model.add(layers.Dense(1024, activation="relu"))
  model.add(layers.Dense(1, activation="softmax"))

  model.compile(optimizer='adam',
                loss=tf.losses.mean_squared_error,
                metrics=[tf.metrics.SparseCategoricalAccuracy()])

  model.summary()
  return model

用于训练模型。问题只是关于如何读取训练数据？

【问题讨论】：

你看过flow_from_dataframe吗？我查了一下，好像你不能写像“label = float”这样的东西。也许我应该简单地编写一个将标签读取为浮点数的函数您可以随时使用tf.cast 函数来转换张量的dtype。我建议使用 tf.data.Dataset 加载数据集。 【参考方案1】：

我会回答我自己的问题。最好的方法是编写一个读取标签和图像的手动函数。假设图像位于“data\Images”中，标签位于 .txt 文件中，标签位于 .txt 文件中的“data\train_test_files\All_labels.txt”中。那么以下两种方法就可以完成这项工作：

def loadImages(IMG_SIZE):
  path = os.path.join(os.getcwd(), 'data\\Images')
  training_data=[]
  labelMap = getLabelMap()
  for img in os.listdir(path):
    out_array = np.zeros((350,350, 3), np.float32) #350x350 is the pixel size of the images
    try:
     img_array = cv2.imread(os.path.join(path, img))
     img_array=img_array.astype('float32')
     out_array = cv2.normalize(img_array, out_array, 0, 1, cv2.NORM_MINMAX)
     out_array = cv2.resize(out_array, (IMG_SIZE, IMG_SIZE)
     training_data.append([out_array, float(labelMap[img])])
    except Exception as e:
      pass
 return training_data

 def getLabelMap():
   map = 
   path = os.getcwd()
   path = os.path.join(path, "data\\train_test_files\\All_labels.txt")
   f = open(path, "r")
   for line in f:
     line = line.split()     #lines in txt file are of the form 'image_name.jpg 3.2'
     map[line[0]] = line[1]  #3.2 is the label
   f.close()
   return map
#call of method:
training_set=[]
training_set = loadImages(244)  #I want to have my images resized to 244x244

【讨论】：

不要使用map作为变量名，它是一个内置函数。我建议改为label_map。

以上是关于Tensorflow：使用浮点数作为标签创建训练数据的最佳方式的主要内容，如果未能解决你的问题，请参考以下文章