大数据Spark学习:Scala基础第一课
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据Spark学习:Scala基础第一课相关的知识,希望对你有一定的参考价值。
计划:
阶段1:
精通Spark内核
阶段2:
精通千万级的项目
阶段3:
机器学习
JAVA本身不是伟大的语言,伟大的是JVM,构件分布式平台什么的,依赖的是JVM,不一定要JAVA语言
可认为Scala是JAVA的升级语言,JAVA是支持面向对象的语言,而非纯面向对象的语言。Scala是一切皆对象, 是纯面向对象语言。面向对象结合函数式编程。
不可变变量声明val result = 10+2 不可再次赋值,分布式数据的时候,传输数据、校验数据等不想改变这个数据
可变变量var name = "Spark" 这个是可变的,可以再次赋值
建议首先用val,没办法的时候才用var
不指定类型可根据只自动识别,也可指定类型
var name:String = "Spark"
一旦指定类型了,赋值只能赋值指定类型的子类型
也可一次性赋值多个
val age1,age2,age3 = 0
一切皆对象的理解,最基本数据也是对象:
scala> 0.to(5)
res4: scala.collection.immutable.Range.Inclusive = Range(0, 1, 2, 3, 4, 5)
scala> 1+1
res5: Int = 2
scala> 1.+(1)
warning: there were 1 deprecation warning(s); re-run with -deprecation for detai
ls
res6: Double = 2.0
其中的+其实是方法
scala> import scala.math._
import scala.math._
scala> min(20,4)
res8: Int = 4
=========apply=======
scala> val array = Array(1,2,3,4)
array: Array[Int] = Array(1, 2, 3, 4)
scala> array
res11: Array[Int] = Array(1, 2, 3, 4)
scala> val arr = Array.apply(1,2,3,4)
arr: Array[Int] = Array(1, 2, 3, 4)
scala> arr
res12: Array[Int] = Array(1, 2, 3, 4)
=========表达式if=======
scala> val age = 19
age: Int = 19
scala> if(age>=18)"adult" else "child"
res14: String = adult
=========表达式块=======
scala> val result = if(age>=18) {
| "adult"
| }else{
| "child"
| }
result: String = adult
scala> var buffered = 0
buffered: Int = 0
scala> val result = if(age>=18){
| "adult"
| }
result: Any = adult
scala> val result = if(age>=18){
| "adult"
| buffered = 10
| buffered
| }
<console>:13: warning: a pure expression does nothing in statement position; you
may be omitting necessary parentheses
"adult"
^
result: AnyVal = 10
Scala的最后的代码快的值就是返回值
=========打印,换行、占位符等=======
scala> println("Spark")
Spark
scala> print("Spark")
Spark
scala> print("\nSpark")
Spark
scala> printf(" %s is the future of big data computation framework", "Spark")
Spark is the future of big data computation framework
scala> printf(" %s is the future of big data computation framework \n ", "Spark
")
Spark is the future of big data computation framework
=========readLine=======
scala> readLine
res21: String = ss
scala> readLine(" Please input you password: ")
Please input you password: res22: String = hahah
scala> readInt
res23: Int = 9
=========循环while=======
scala> var element = 100
element: Int = 100
scala> while(element>10){
| println(element)
| element -= 10
| }
100
90
80
70
60
50
40
30
20
=========循环for=======
scala> 0 to element
res25: scala.collection.immutable.Range.Inclusive = Range(0, 1, 2, 3, 4, 5, 6, 7
, 8, 9, 10)
scala> for(i<-0 to element)println(i)
0
1
2
3
4
5
6
7
8
9
10
scala> for(i<-0 to element if i%2==0)println(i)
0
2
4
6
8
10
scala> import scala.util.control.Breaks._
import scala.util.control.Breaks._
scala> for(i<-0 to element if i%2==0){
| println(i)
| if(i==4) break
| }
0
2
4
scala.util.control.BreakControl
scala> def f1:Any={
| for(i<-1 to 10){
| if(i==10) return i
| println(i)
| }
| }
f1: Any
scala> f1
1
2
3
4
5
6
7
8
9
res30: Any = 10
=========函数默认参数和带名参数(可不按照参数顺序来传)=======
scala> def f2(param1:String,param2:Int = 30) = param1+param2
f2: (param1: String, param2: Int)String
scala> f2("Spark")
res33: String = Spark30
scala> f2(param2=100,param1 ="Scala")
res34: String = Scala100
=========函数变长参数=======
scala> def sum(numbers:Int*)={var result=0;for(element<-numbers)result+=element;
result}
sum: (numbers: Int*)Int
scala> sum(1,2,3,4,5,6,7,8,9,10)
res0: Int = 55
scala> sum(1,2,3,4,5,6,7)
res1: Int = 28
如果要操作1到100这样,range不是整形
scala> sum(1 to 100)
<console>:9: error: type mismatch;
found : scala.collection.immutable.Range.Inclusive
required: Int
sum(1 to 100)
^
:_*是把range里面的每个元素提取出来(超重点语法)
scala> sum(1 to 100:_*)
res3: Int = 5050
=========过程(无结果,只有过程)=======
scala> def f3(con: String) = {println("Good "+con)}
f3: (con: String)Unit
scala> f3("yyy")
Good yyy
scala> def f3(con: String):Unit = {println("Good "+con)}
f3: (con: String)Unit
scala> f3("xxx")
Good xxx
=========lazy(只有使用到的时候才会加载,比较耗时时用)=======
scala> import scala.io.Source._
import scala.io.Source._
//文件不存在,因为有lazy,所以不报错
scala> lazy val content = fromFile("F:/xxx")
content: scala.io.BufferedSource = <lazy>
//文件不存在,因为没lazy,所以报错
scala> val content = fromFile("F:/xxx")
java.io.FileNotFoundException: F:\xxx (系统找不到指定的文件。)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
//文件存在,没lazy,不会报错
scala> val content = fromFile("F:/FH11000001201601121730091044660854A")
content: scala.io.BufferedSource = non-empty iterator
=========异常=======
scala> import java.io.FileNotFoundException
import java.io.FileNotFoundException
try{
val content = fromFile("F:/xxx")
}catch{
case _:FileNotFoundException=>println("hahahh,not found")
}finally{
println("88")
}
=========不可变数组=======
val表示array指向的地址不能修改,而不是里面的元素
scala> val array = new Array[Int](5)
array: Array[Int] = Array(0, 0, 0, 0, 0)
scala> array(3)
res14: Int = 0
scala> array(2) = 8
scala> array
res16: Array[Int] = Array(0, 0, 8, 0, 0)
scala> val arr1 = Array("Scala","Spark")
arr1: Array[String] = Array(Scala, Spark)
scala> arr1(2) = "hadoop"
java.lang.ArrayIndexOutOfBoundsException: 2
at .<init>(<console>:13)
at .<clinit>(<console>)
=========可变数组=======
scala> val arrBuffer = ArrayBuffer[Int]()
arrBuffer: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer()
scala> arrBuffer +=10
res18: arrBuffer.type = ArrayBuffer(10)
scala> arrBuffer +=(11,1,2,37,8,)
<console>:1: error: illegal start of simple expression
arrBuffer +=(11,1,2,37,8,)
^
scala> arrBuffer +=(11,1,2,37,8)
res19: arrBuffer.type = ArrayBuffer(10, 11, 1, 2, 37, 8)
scala> arrBuffer ++=Array(1,2,3,4)
res20: arrBuffer.type = ArrayBuffer(10, 11, 1, 2, 37, 8, 1, 2, 3, 4)
scala> arrBuffer.trimEnd(3)
scala> arrBuffer
res22: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 1, 2, 37
8, 1)
scala> arrBuffer.trimEnd(3)
scala> arrBuffer
res24: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 1, 2)
scala> arrBuffer.insert(2,100)
scala> arrBuffer
res28: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 100, 1, 2
)
scala> arrBuffer.insert(2,33,44)
scala> arrBuffer
res30: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 33, 44, 1
00, 1, 2)
scala> arrBuffer.remove(3)
res32: Int = 44
scala> arrBuffer
res33: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 33, 100,
1, 2)
scala> arrBuffer.remove(3,2)
scala> arrBuffer
res35: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 33, 2)
=========数组操作:可变数组和不可变数组转换以及排序等等=======
scala> arrBuffer.toArray
res36: Array[Int] = Array(10, 11, 33, 2)
scala> val array2 = arrBuffer.toArray
array2: Array[Int] = Array(10, 11, 33, 2)
scala> array2.toBuffer
res37: scala.collection.mutable.Buffer[Int] = ArrayBuffer(10, 11, 33, 2)
scala> for(elem<-array2)println(elem)
10
11
33
2
//每2个位置打印一次
scala> for(i<-0 until (array2.length,2))println(array2(i))
10
33
//从尾部开始
scala> for(i<-(0 until array2.length).reverse)println(array2(i))
2
33
11
10
//排序
scala> import scala.util.Sorting
import scala.util.Sorting
scala> Sorting.quickSort(array2)
scala> array2
res44: Array[Int] = Array(2, 10, 11, 33)
//每个元素搜集起来
scala> for(i<-array2) yield i*i
res45: Array[Int] = Array(4, 100, 121, 1089)
//取里面小于3的数
scala> for(i<-array2 if i/3==0) yield i
res46: Array[Int] = Array(2)
//更牛的写法
scala> array2.filter(_%3==0).map(i=>i*i)
res47: Array[Int] = Array(1089)
=========Map=======
//不可变Map
scala> val person = Map("Spark"->6,"Hadoop"->11)
person: scala.collection.immutable.Map[String,Int] = Map(Spark -> 6, Hadoop -> 1
1)
scala> person("Hadoop")
res51: Int = 11
//可变Map
scala> val persons = scala.collection.mutable.Map("Spark"->6,"Hadoop"->11)
persons: scala.collection.mutable.Map[String,Int] = Map(Hadoop -> 11, Spark -> 6
)
scala> persons += ("Flink"->5)
res52: persons.type = Map(Hadoop -> 11, Spark -> 6, Flink -> 5)
scala> val sparkValue = if(persons.contains("Spark"))persons("Spark") else 1000
sparkValue: Int = 6
scala> val sparkValue = persons.getOrElse("Spark",1000)
sparkValue: Int = 6
scala> for((key,value)<-persons) println(key+":"+value)
Hadoop:11
Spark:6
Flink:5
//排过序的
scala> val persons = scala.collection.immutable.SortedMap("Spark"->6,"Hadoop"->1
1)
persons: scala.collection.immutable.SortedMap[String,Int] = Map(Hadoop -> 11, Sp
ark -> 6)
=========Tuple(函数最后返回若干不同类型的时候,就用Tuple)=======
scala> val tuple = ("Spark",1,3.0)
tuple: (String, Int, Double) = (Spark,1,3.0)
scala> tuple._1
res55: String = Spark
之后稍微过了一下Spark的一些源码,主要体现上面这些代码
作业一:
移除一个数组中第一个负数后的所有负数(不包括第一个)
我的解答:
def main (args :Array[String]):Unit={
//val array = Array(1,3,7,-12,4,8,9,-11,-10,2,8,9)
val array = Array(1,3,7,4,8,9,2,8,9)
val firstIndex = getFirstIndex(array )
if(firstIndex>= 0){
val arrayBuffer = array.filter(_ >=0). map(i =>i). toBuffer
arrayBuffer.insert (firstIndex , array (firstIndex ))
for(element <-arrayBuffer )print (element +" ")
}else{
val array2 = array. filter(_>= 0).map (i =>i )
for(element <-array2)print(element+ " ")
}
}
def getFirstIndex( array:Array[Int]):Int={
for(i<-0 until array.length ){if(array(i)<0 )return i}
-1
}
老师的答案:
方式一:
发现第一个负数之后的每一个负数会立即进行移除,性能较差,多次移动数组
方式二:
首先记录所有不需要移除的元素的索引,最后一次性移除所有需要移除的元素,性能相对较高
问题:firstNegative是啥东西?Scala自带的第一个负数识别?还是啥意思?
本文出自 “一支花傲寒” 博客,谢绝转载!
以上是关于大数据Spark学习:Scala基础第一课的主要内容,如果未能解决你的问题,请参考以下文章