spark学习02天-scala读取文件,词频统计
Posted students
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark学习02天-scala读取文件,词频统计相关的知识,希望对你有一定的参考价值。
1.在本地安装jdk环境和scala环境
2.读取本地文件:
scala> import scala.io.Source import scala.io.Source scala> val lines=Source.fromFile("F:/ziyuan_badou/file.txt").getLines().toList
lines: List[String] = List("With the development of civilization, it is the chil dren‘s duty to study in school since they were small. As the young kids, it is t heir nature to hang out for fun. ", "", "While for them, most of the time have b een limited in the class. So they feel frustrated and don‘t have much passion to study. It is of great importance to develop ", "", "interest. The first thing i s to broaden vision. The students can read travel books or watch tourist show, f or anyone who cannot resist the charm of beautiful scenery ", "", and delicious food. The second thing is taking the right attitude to exams. Never giving too m uch pressure on getting high marks. The only thing we should do is to enjoy gain ing knowledge.)
3.词频topN计算
scala> lines.map(x=>x.split(" ")).flatten.map(x=>(x,1)).groupBy(x=>x._1).map(x=> (x._1,x._2.map(x=>x._2).sum)).toList.sortBy(x=>x._2).reverse res0: List[(String, Int)] = List((the,7), (to,7), (is,6), (of,4), (The,4), (thin g,3), (for,3), ("",3), (and,2), (much,2), (they,2), (it,2), (have,2), (in,2), (o nly,1), (right,1), (show,,1), (exams.,1), (high,1), (since,1), (study,1), (study .,1), (great,1), (we,1), (interest.,1), (develop,1), (As,1), (passion,1), (were, 1), (time,1), (them,,1), (children‘s,1), (development,1), (knowledge.,1), (It,1) , (anyone,1), (Never,1), (nature,1), (enjoy,1), (first,1), (taking,1), (frustrat ed,1), (books,1), (delicious,1), (So,1), (their,1), (resist,1), (should,1), (sma ll.,1), (gaining,1), (While,1), (who,1), (on,1), (can,1), (been,1), (second,1), (travel,1), (most,1), (scenery,1), (getting,1), (attitude,1), (cannot,1), (civil ization,,1), (broaden,1), (out,1), (food.,1), (don‘t,1), (importance,1), (kid...
以上是关于spark学习02天-scala读取文件,词频统计的主要内容,如果未能解决你的问题,请参考以下文章