spark学习02天-scala读取文件,词频统计

Posted students

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark学习02天-scala读取文件,词频统计相关的知识,希望对你有一定的参考价值。

1.在本地安装jdk环境和scala环境

技术图片

 

2.读取本地文件:

 

scala> import scala.io.Source
import scala.io.Source

scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
lines: List[String]
= List("With the development of civilization, it is the chil drens duty to study in school since they were small. As the young kids, it is t heir nature to hang out for fun. ", "", "While for them, most of the time have b een limited in the class. So they feel frustrated and dont have much passion to study. It is of great importance to develop ", "", "interest. The first thing i s to broaden vision. The students can read travel books or watch tourist show, f or anyone who cannot resist the charm of beautiful scenery ", "", and delicious food. The second thing is taking the right attitude to exams. Never giving too m uch pressure on getting high marks. The only thing we should do is to enjoy gain ing knowledge.)

3.词频topN计算

scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=>
(x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse
res0: List[(String, Int)] = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin
g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o
nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study
.,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were,
1), (time,1), (them,,1), (childrens,1), (development,1), (knowledge.,1), (It,1)
, (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat
ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma
ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1),
(travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil
ization,,1), (broaden,1), (out,1), (food.,1), (dont,1), (importance,1), (kid...

 

 

以上是关于spark学习02天-scala读取文件,词频统计的主要内容,如果未能解决你的问题,请参考以下文章

02 使用spark进行词频统计scala交互

Scala配置和Spark配置以及Scala一些函数的用法(附带词频统计实例)

Spark算法实例:词频统计

Spark基于scala api

添加spark的相关依赖和打包插件(第六弹)

大数据基础之词频统计Word Count