无法在 NaiveBayes Spark 示例上将字符串转换为浮点数

Posted 2023-04-15

技术标签:

【中文标题】无法在 NaiveBayes Spark 示例上将字符串转换为浮点数【英文标题】：Could not convert string to float on NaiveBayes Spark example 【发布时间】：2018-02-19 16:46:11 【问题描述】：

我正在关注关于 Spark 1.6 的 this 教程。

我复制了如下相同的代码：

from pyspark.mllib.classification import NaiveBayes, NaiveBayesModel
from pyspark.mllib.linalg import Vectors
from pyspark.mllib.regression import LabeledPoint
from pyspark import SparkContext, SparkConf


def parseLine(line):
    parts = line.split(',')
    label = float(parts[0])
    features = Vectors.dense([float(x) for x in parts[1].split(' ')])
    return LabeledPoint(label, features)

conf= SparkConf()
conf.setAppName("NaiveBaye")
conf.set('spark.driver.memory','6g')
conf.set('spark.executor.memory','6g')
conf.set('spark.cores.max',156)

sc = SparkContext(conf= conf)

data = sc.textFile('sample_naive_bayes_data.txt').map(parseLine)

# Split data aproximately into training (60%) and test (40%)
training, test = data.randomSplit([0.6, 0.4], seed=0)

# Train a naive Bayes model.
model = NaiveBayes.train(training, 1.0)

# Make prediction and test accuracy.
predictionAndLabel = test.map(lambda p: (model.predict(p.features), p.label))
accuracy = 1.0 * predictionAndLabel.filter(lambda (x, v): x == v).count() / test.count()

# Save and load model
model.save(sc, "model")
sameModel = NaiveBayesModel.load(sc, "model")

sample_naive_bayes_data.txt 包含以下内容：

0, 1.0 0.0 0.0
0, 2.0 0.0 0.0
1, 0.0 1.0 0.0
1, 0.0 2.0 0.0
2, 0.0 0.0 1.0
2, 0.0 0.0 2.0

这是一个非常基础的教程，但仍然无法正常工作。

它给了我这个错误：无法在这一行将字符串转换为浮点数：

features = Vectors.dense([float(x) for x in parts[1].split(' ')])

谁能解释一下为什么以及如何解决它？

编辑 1

我正在尝试对字符串值进行一些更改：

label = str(parts[0])
features = Vectors.dense([str(x) for x in parts[1].split('')])

使用这个数据集：

positive, happy food food
positive, dog food food
negative, food happy food
negative, food dog food
neutral, food food happy
neutral, food food dog

有相同的值，但使用字符串而不是浮点值。在前面的示例中，准确度为：1.0。

现在，如果我尝试运行此代码，我会收到此错误：

ValueError: could not convert string to float: happy on this line: 
model = NaiveBayes.train(training, 1.0)

【问题讨论】：

如果您在该行之前打印 x 可能有助于找到问题。然后你可以看到 x 失败时的内容。 【参考方案1】：

您收到错误是因为 split(" ")。 sample_naive_bayes_data.txt 中的空格和 split 方法中的空格不匹配。

替换

features = Vectors.dense([float(x) for x in parts[1].split(' ')])

与

features = Vectors.dense([float(x) for x in parts[1].split()])

它应该可以工作。

【讨论】：

以上是关于无法在 NaiveBayes Spark 示例上将字符串转换为浮点数的主要内容，如果未能解决你的问题，请参考以下文章