Spark成长之路-Hypothesis testing
Posted Q博士
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark成长之路-Hypothesis testing相关的知识,希望对你有一定的参考价值。
样例
import org.apache.spark.ml.linalg.Vector, Vectors
import org.apache.spark.ml.stat.ChiSquareTest
import org.apache.spark.sql.SparkSession
object HypothesisTestingExample
def main(args: Array[String]): Unit =
val spark = SparkSession.builder.appName("HypothesisTestingExample").getOrCreate()
spark.sparkContext.setLogLevel("WARN")
val data = Seq(
(0.0, Vectors.dense(0.5, 10.0)),
(0.0, Vectors.dense(1.5, 20.0)),
(1.0, Vectors.dense(1.5, 30.0)),
(0.0, Vectors.dense(3.5, 30.0)),
(0.0, Vectors.dense(3.5, 40.0)),
(1.0, Vectors.dense(3.5, 40.0))
)
import spark.implicits._
val df = data.toDF("label", "features")
val chi = ChiSquareTest.test(df, "features", "label").head
println("pValues = " + chi.getAs[Vector](0))
println("degreesOfFreedom = " + chi.getSeq[Int](1).mkString("[", ",", "]"))
println("statistics = " + chi.getAs[Vector](2))
结果
pValues = [0.6872892787909721,0.6822703303362126]
degreesOfFreedom = [2,3]
statistics = [0.75,1.5]
以上是关于Spark成长之路-Hypothesis testing的主要内容,如果未能解决你的问题,请参考以下文章