大数据Spark学习:Scala基础第一课

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据Spark学习:Scala基础第一课相关的知识,希望对你有一定的参考价值。

计划:

阶段1:

精通Spark内核

阶段2:

精通千万级的项目

阶段3:

机器学习

JAVA本身不是伟大的语言,伟大的是JVM,构件分布式平台什么的,依赖的是JVM,不一定要JAVA语言

可认为Scala是JAVA的升级语言,JAVA是支持面向对象的语言,而非纯面向对象的语言。Scala是一切皆对象, 是纯面向对象语言。面向对象结合函数式编程。

技术分享

技术分享

不可变变量声明val result = 10+2  不可再次赋值,分布式数据的时候,传输数据、校验数据等不想改变这个数据

可变变量var name = "Spark"  这个是可变的,可以再次赋值

建议首先用val,没办法的时候才用var

不指定类型可根据只自动识别,也可指定类型

var name:String = "Spark"

一旦指定类型了,赋值只能赋值指定类型的子类型

也可一次性赋值多个

val age1,age2,age3 = 0

一切皆对象的理解,最基本数据也是对象:

scala> 0.to(5)

res4: scala.collection.immutable.Range.Inclusive = Range(0, 1, 2, 3, 4, 5)

scala> 1+1

res5: Int = 2

scala> 1.+(1)

warning: there were 1 deprecation warning(s); re-run with -deprecation for detai

ls

res6: Double = 2.0

其中的+其实是方法

scala> import scala.math._

import scala.math._

scala> min(20,4)

res8: Int = 4

=========apply=======

scala> val array = Array(1,2,3,4)

array: Array[Int] = Array(1, 2, 3, 4)

scala> array

res11: Array[Int] = Array(1, 2, 3, 4)

scala> val arr = Array.apply(1,2,3,4)

arr: Array[Int] = Array(1, 2, 3, 4)

scala> arr

res12: Array[Int] = Array(1, 2, 3, 4)

=========表达式if=======

scala> val age = 19

age: Int = 19

scala> if(age>=18)"adult" else "child"

res14: String = adult

=========表达式块=======

scala> val result = if(age>=18) {

     | "adult"

     | }else{

     | "child"

     | }

result: String = adult

scala> var buffered = 0

buffered: Int = 0

scala> val result = if(age>=18){

     | "adult"

     | }

result: Any = adult

scala> val result = if(age>=18){

     | "adult"

     | buffered = 10

     | buffered

     | }

<console>:13: warning: a pure expression does nothing in statement position; you

 may be omitting necessary parentheses

       "adult"

       ^

result: AnyVal = 10

Scala的最后的代码快的值就是返回值

=========打印,换行、占位符等=======

scala> println("Spark")

Spark

scala> print("Spark")

Spark

scala> print("\nSpark")

Spark

scala> printf("  %s is the future of big data computation framework", "Spark")

  Spark is the future of big data computation framework

scala> printf("  %s is the future of big data computation framework \n ", "Spark

")

  Spark is the future of big data computation framework

=========readLine=======

scala> readLine

res21: String = ss

scala> readLine(" Please input you password: ")

 Please input you password: res22: String = hahah

scala> readInt

res23: Int = 9

=========循环while=======

scala> var element = 100

element: Int = 100

scala> while(element>10){

     | println(element)

     | element -= 10

     | }

100

90

80

70

60

50

40

30

20

=========循环for=======

scala> 0 to element

res25: scala.collection.immutable.Range.Inclusive = Range(0, 1, 2, 3, 4, 5, 6, 7

, 8, 9, 10)

scala> for(i<-0 to element)println(i)

0

1

2

3

4

5

6

7

8

9

10

scala> for(i<-0 to element if i%2==0)println(i)

0

2

4

6

8

10

scala> import scala.util.control.Breaks._

import scala.util.control.Breaks._

scala> for(i<-0 to element if i%2==0){

     | println(i)

     | if(i==4) break

     | }

0

2

4

scala.util.control.BreakControl

scala> def f1:Any={

     | for(i<-1 to 10){

     | if(i==10) return i

     | println(i)

     | }

     | }

f1: Any

scala> f1

1

2

3

4

5

6

7

8

9

res30: Any = 10

=========函数默认参数和带名参数(可不按照参数顺序来传)=======

scala> def f2(param1:String,param2:Int = 30) = param1+param2

f2: (param1: String, param2: Int)String

scala> f2("Spark")

res33: String = Spark30

scala> f2(param2=100,param1 ="Scala")

res34: String = Scala100

=========函数变长参数=======

scala> def sum(numbers:Int*)={var result=0;for(element<-numbers)result+=element;

result}

sum: (numbers: Int*)Int

scala> sum(1,2,3,4,5,6,7,8,9,10)

res0: Int = 55

scala> sum(1,2,3,4,5,6,7)

res1: Int = 28

如果要操作1到100这样,range不是整形

scala> sum(1 to 100)

<console>:9: error: type mismatch;

 found   : scala.collection.immutable.Range.Inclusive

 required: Int

              sum(1 to 100)

                    ^

:_*是把range里面的每个元素提取出来(超重点语法)

scala> sum(1 to 100:_*)

res3: Int = 5050

=========过程(无结果,只有过程)=======

scala> def f3(con: String) = {println("Good "+con)}

f3: (con: String)Unit

scala> f3("yyy")

Good yyy

scala> def f3(con: String):Unit = {println("Good "+con)}

f3: (con: String)Unit

scala> f3("xxx")

Good xxx

=========lazy(只有使用到的时候才会加载,比较耗时时用)=======

scala> import scala.io.Source._

import scala.io.Source._

//文件不存在,因为有lazy,所以不报错

scala> lazy val content = fromFile("F:/xxx")

content: scala.io.BufferedSource = <lazy>

//文件不存在,因为没lazy,所以报错

scala> val content = fromFile("F:/xxx")

java.io.FileNotFoundException: F:\xxx (系统找不到指定的文件。)

        at java.io.FileInputStream.open(Native Method)

        at java.io.FileInputStream.<init>(FileInputStream.java:146)

//文件存在,没lazy,不会报错

scala> val content = fromFile("F:/FH11000001201601121730091044660854A")

content: scala.io.BufferedSource = non-empty iterator

=========异常=======

scala> import java.io.FileNotFoundException

import java.io.FileNotFoundException

try{

    val content = fromFile("F:/xxx")

}catch{

    case _:FileNotFoundException=>println("hahahh,not found")

}finally{

    println("88")

}

=========不可变数组=======

val表示array指向的地址不能修改,而不是里面的元素

scala> val array = new Array[Int](5)

array: Array[Int] = Array(0, 0, 0, 0, 0)

scala> array(3)

res14: Int = 0

scala> array(2) = 8

scala> array

res16: Array[Int] = Array(0, 0, 8, 0, 0)

scala> val arr1 = Array("Scala","Spark")

arr1: Array[String] = Array(Scala, Spark)

scala> arr1(2) = "hadoop"

java.lang.ArrayIndexOutOfBoundsException: 2

        at .<init>(<console>:13)

        at .<clinit>(<console>)

=========可变数组=======

scala> val arrBuffer = ArrayBuffer[Int]()

arrBuffer: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer()

scala> arrBuffer +=10

res18: arrBuffer.type = ArrayBuffer(10)

scala> arrBuffer +=(11,1,2,37,8,)

<console>:1: error: illegal start of simple expression

       arrBuffer +=(11,1,2,37,8,)

                                ^

scala> arrBuffer +=(11,1,2,37,8)

res19: arrBuffer.type = ArrayBuffer(10, 11, 1, 2, 37, 8)

scala> arrBuffer ++=Array(1,2,3,4)

res20: arrBuffer.type = ArrayBuffer(10, 11, 1, 2, 37, 8, 1, 2, 3, 4)

scala> arrBuffer.trimEnd(3)

scala> arrBuffer

res22: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 1, 2, 37

 8, 1)

scala> arrBuffer.trimEnd(3)

scala> arrBuffer

res24: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 1, 2)

scala> arrBuffer.insert(2,100)

scala> arrBuffer

res28: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 100, 1, 2

)

scala> arrBuffer.insert(2,33,44)

scala> arrBuffer

res30: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 33, 44, 1

00, 1, 2)

scala> arrBuffer.remove(3)

res32: Int = 44

scala> arrBuffer

res33: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 33, 100,

1, 2)

scala> arrBuffer.remove(3,2)

scala> arrBuffer

res35: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(10, 11, 33, 2)

=========数组操作:可变数组和不可变数组转换以及排序等等=======

scala> arrBuffer.toArray

res36: Array[Int] = Array(10, 11, 33, 2)

scala> val array2 = arrBuffer.toArray

array2: Array[Int] = Array(10, 11, 33, 2)

scala> array2.toBuffer

res37: scala.collection.mutable.Buffer[Int] = ArrayBuffer(10, 11, 33, 2)

scala> for(elem<-array2)println(elem)

10

11

33

2

//每2个位置打印一次

scala> for(i<-0 until (array2.length,2))println(array2(i))

10

33

//从尾部开始

scala> for(i<-(0 until array2.length).reverse)println(array2(i))

2

33

11

10

//排序

scala> import scala.util.Sorting

import scala.util.Sorting

scala> Sorting.quickSort(array2)

scala> array2

res44: Array[Int] = Array(2, 10, 11, 33)

//每个元素搜集起来

scala> for(i<-array2) yield i*i

res45: Array[Int] = Array(4, 100, 121, 1089)

//取里面小于3的数

scala> for(i<-array2 if i/3==0) yield i

res46: Array[Int] = Array(2)

//更牛的写法

scala> array2.filter(_%3==0).map(i=>i*i)

res47: Array[Int] = Array(1089)

=========Map=======

//不可变Map

scala> val person = Map("Spark"->6,"Hadoop"->11)

person: scala.collection.immutable.Map[String,Int] = Map(Spark -> 6, Hadoop -> 1

1)

scala> person("Hadoop")

res51: Int = 11

//可变Map

scala> val persons = scala.collection.mutable.Map("Spark"->6,"Hadoop"->11)

persons: scala.collection.mutable.Map[String,Int] = Map(Hadoop -> 11, Spark -> 6

)

scala> persons += ("Flink"->5)

res52: persons.type = Map(Hadoop -> 11, Spark -> 6, Flink -> 5)

scala> val sparkValue = if(persons.contains("Spark"))persons("Spark") else 1000

sparkValue: Int = 6

scala> val sparkValue = persons.getOrElse("Spark",1000)

sparkValue: Int = 6

scala> for((key,value)<-persons) println(key+":"+value)

Hadoop:11

Spark:6

Flink:5

//排过序的

scala> val persons = scala.collection.immutable.SortedMap("Spark"->6,"Hadoop"->1

1)

persons: scala.collection.immutable.SortedMap[String,Int] = Map(Hadoop -> 11, Sp

ark -> 6)

=========Tuple(函数最后返回若干不同类型的时候,就用Tuple)=======

scala> val tuple = ("Spark",1,3.0)

tuple: (String, Int, Double) = (Spark,1,3.0)

scala> tuple._1

res55: String = Spark

之后稍微过了一下Spark的一些源码,主要体现上面这些代码

作业一:

移除一个数组中第一个负数后的所有负数(不包括第一个)

我的解答:

def main (args :Array[String]):Unit={

      //val array = Array(1,3,7,-12,4,8,9,-11,-10,2,8,9)

    val array = Array(1,3,7,4,8,9,2,8,9)

      val firstIndex = getFirstIndex(array )

     

      if(firstIndex>= 0){ 

        val arrayBuffer = array.filter(_ >=0). map(i =>i). toBuffer

        arrayBuffer.insert (firstIndex , array (firstIndex ))

         for(element <-arrayBuffer )print (element +" ")

      }else{

        val array2 array. filter(_>= 0).map (i =>i )

         for(element <-array2)print(element+ " ")

      }

  }

 

  def getFirstIndex( array:Array[Int]):Int={

    for(i<-0 until array.length ){if(array(i)<0 )return i}

    -1

  }

老师的答案:

方式一:

技术分享技术分享

发现第一个负数之后的每一个负数会立即进行移除,性能较差,多次移动数组

方式二:

技术分享技术分享

首先记录所有不需要移除的元素的索引,最后一次性移除所有需要移除的元素,性能相对较高



问题:firstNegative是啥东西?Scala自带的第一个负数识别?还是啥意思?

本文出自 “一支花傲寒” 博客,谢绝转载!

以上是关于大数据Spark学习:Scala基础第一课的主要内容,如果未能解决你的问题,请参考以下文章

Scala学习系列——Scala为什么是大数据第一高薪语言

大数据学习:Scala面向对象和Spark一些代码读和问

Scala对于大数据开发重要吗?Scala基础学习建议

大数据 | 适合小白入门的Spark基础及源码分析视频教程

Scala简介及基础语法

大数据计算 Spark的安装和基础编程