Spark成长之路(11)-ngram

Posted Q博士

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark成长之路(11)-ngram相关的知识,希望对你有一定的参考价值。

ngram

简介

N-gram

代码

object NGramExample extends SparkObject 

  def main(args: Array[String]): Unit = 
    val wordDataFrame = spark.createDataFrame(Seq(
      (0, Array("Hi", "I", "heard", "about", "Spark")),
      (1, Array("I", "wish", "Java", "could", "use", "case", "classes")),
      (2, Array("Logistic", "regression", "models", "are", "neat"))
    )).toDF("id", "words")

    val ngram = new NGram().setN(2).setInputCol("words").setOutputCol("ngrams")

    val ngramDataFrame = ngram.transform(wordDataFrame)
    ngramDataFrame.select("ngrams").show(false)
  

执行结果

+------------------------------------------------------------------+
|ngrams                                                            |
+------------------------------------------------------------------+
|[Hi I, I heard, heard about, about Spark]                         |
|[I wish, wish Java, Java could, could use, use case, case classes]|
|[Logistic regression, regression models, models are, are neat]    |
+------------------------------------------------------------------+

以上是关于Spark成长之路(11)-ngram的主要内容,如果未能解决你的问题,请参考以下文章

Spark成长之路(10)-CountVectorizer

Spark成长之路-TFIDF

Spark成长之路-消息队列

spark成长之路spark究竟是什么?

Spark成长之路(13)-DataSet与DataFrame

Spark成长之路-Word2Vec