Spark成长之路-Word2Vec
Posted Q博士
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark成长之路-Word2Vec相关的知识,希望对你有一定的参考价值。
简介
将文本映射到K维空间的向量值。
代码
object Word2VecExample
def main(args: Array[String]): Unit =
val spark = SparkSession.builder().getOrCreate()
spark.sparkContext.setLogLevel("WARN")
// Input data: Each row is a bag of words from a sentence or document.
val documentDF = spark.createDataFrame(Seq(
"Hi I heard about Spark".split(" "),
"I wish Java could use case classes".split(" "),
"Logistic regression models are neat".split(" ")
).map(Tuple1.apply)).toDF("text")
// Learn a mapping from words to Vectors.
val word2Vec = new Word2Vec()
.setInputCol("text")
.setOutputCol("result")
.setVectorSize(6)
.setMinCount(0)
val model = word2Vec.fit(documentDF)
val result = model.transform(documentDF)
result.show()
result.collect().foreach case Row(text: Seq[_], features: Vector) =>
println(s"Text: [$text.mkString(", ")] => \\nVector: $features\\n")
结果
Text: [Hi, I, heard, about, Spark] =>
Vector: [0.0068203588947653776,0.017414073273539544,0.008097704406827689,-0.034566799923777584,-0.004852301999926568,0.022082760557532312]
Text: [I, wish, Java, could, use, case, classes] =>
Vector: [0.045732982855822356,-2.3274788899081092E-4,0.032252547198108265,0.0015899876930883952,-0.020712170167826116,0.016202476141708236]
Text: [Logistic, regression, models, are, neat] =>
Vector: [-0.02979586571455002,0.029230652749538424,-0.03639255976304412,-3.955196589231491E-4,-0.00870799645781517,-0.03496376480907202]
以上是关于Spark成长之路-Word2Vec的主要内容,如果未能解决你的问题,请参考以下文章