大数据(7h)比较Python和Scala的数据容器
Posted 小基基o_O
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据(7h)比较Python和Scala的数据容器相关的知识,希望对你有一定的参考价值。
以Python为参照,理解和速查Scala语法
Scala数据容器介绍
关键词 | 🔉 | 直译 | 理解 |
---|---|---|---|
iterable | adj. 可迭代的 | 可重复(从迭代器或容器中)取出数据 | |
collection | kəˈlekʃn | n. 采集;[税收] 征收;收藏品 | 数据容器 |
mutable | ˈmjuːtəbl | adj. 可变的 | 可变的数据容器 |
immutable | ɪˈmjuːtəbl | adj. 不可变的 | 不可变的数据容器 |
- collection继承
Iterable
特质 - collection分为
mutable
(可变)和immutable
(不可变) mutable
:原对象可增删immutable
:原对象不可增删,增删将返回新对象- 大数据场景下,
immutable
更常用
import scala.collection.immutable
import scala.collection.mutable
Set(set)
不可变
s1 = {1, 2}
s2 = s1 | {2, 3} # 并集
s3 = s2 & {3, 4} # 交集
val s1 = Set(1, 2)
val s2 = s1 | Set(2, 3) //并集
val s3 = s2 & Set(3, 4) //交集
可变
s = {1, 2}
s.add(3) # {1, 2, 3}
import scala.collection.mutable
val s = mutable.Set(1, 2)
s.add(3) // Set(1, 2, 3)
Map(dict)
不可变
d = {'a': 1, 'b': 2}
# 访问
print(d['b'])
print(d.get('c', 99))
# 遍历
for kv in d.items(): print(kv)
# 查看keys
print(d.keys()) # dict_keys(['a', 'b'])
# 查看values
print(d.values()) # dict_values([1, 2])
val d = Map("a"->1,"b"->2) //打印:Map(a -> 1, b -> 2)
// 访问
println(d.get("b")) // Some(2)
println(d.getOrElse("c",99)) // 99
// 遍历
for (kv <- d) {println(kv)}
// 查看keys
println(d.keys) // Set(a, b)
// 查看values
println(d.values) // Iterable(1, 2)
可变
d = {'a': 1, 'b': 2}
del d['a']
d['c'] = 33
val d = mutable.Map("a"->1,"b"->2)
// 删数据
d.-=("a") //HashMap(b -> 2)
// 修改或添加数据
d.update("c",33) //HashMap(b -> 2, c -> 33)
List(list)
不可变
l0 = [1, 2, 'c']
l1 = l0 + [4, 4] # 拼接
l2 = l1[2:] # 截取
print(l1) # [1, 2, 'c', 4, 4]
print(l2) # ['c', 4, 4]
val l0 = List(1,2,"c")
val l1 = l0.appendedAll(List(4,4)) // 拼接
val l2 = l1.drop(2) // 截取
println(l1) // List(1, 2, c, 4, 4)
println(l2) // List(c, 4, 4)
可变
ls = [1, 2, 3]
ls[0] = 9
ls.insert(0, 99)
val ls = ListBuffer(1,2,3)
// 修改
ls(0) = 9 //ListBuffer(9, 2, 3)
// 插入
ls.insert(0, 99) //ListBuffer(99, 9, 2, 3)
元组
t = ('a', (2, 'c'))
print(t[0])
print(t[1][1])
val t = ("a",(2,"c"))
for (i <- t.productIterator) println(i) // 遍历
println(t) // (a,(2,c))
println(t.productElement(0)) // a
println(t._1) // a
println(t._2._2) // c
Range(range)
range(2, 5)
val r = Range(2,5)
println(r) // Range 2 until 5
r.foreach(print) // 234
Array
不可变
import numpy as np
# 创建【 np.array([0, 0, 0]) 】
a = np.zeros(3)
# 遍历
for i in a:
print(i)
# 修改元素
a[0] = 99
print(a[0])
# 拼接
aa = np.concatenate([a, np.array([2, 2])], 0)
print(aa) # [99. 0. 0. 2. 2.]
// 创建【val a:Array[Int] = Array(0,0,0)】
val a:Array[Int] = new Array[Int](3)
// 遍历
for (elem <- a) println(elem)
// 修改元素【a.update(0,99)】
a(0) = 99
println(a(0))
// 拼接
val aa = a.concat(Array(2,2))
println(aa.mkString("-")) // 99-0-0-2-2
可变
import numpy as np
a = np.array([2, 3, 4, 5, 6])
a = np.insert(a, 0, 99) # 插入
a = np.append(a, [7, 8]) # 尾加
a = np.delete(a, range(2, 5)) # 移除
print(a) # [99 2 6 7 8]
import scala.collection.mutable.ArrayBuffer
val a = ArrayBuffer(2,3,4,5,6)
a.insert(0,99) // 插入
a.appendAll(ArrayBuffer(7,8)) // 尾加
a.remove(2,3) // 移除
println(a) // ArrayBuffer(99, 2, 6, 7, 8)
队列
import scala.collection.mutable
val q:mutable.Queue[Int] = mutable.Queue(1,2,3)
println(q.enqueue(99)) // Queue(1, 2, 3, 99)
println(q.dequeue()) // 1
println(q) // Queue(2, 3, 99)
并行collection
import scala.collection.parallel.immutable
val s1 = Seq(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
val s2 = immutable.ParSeq(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
s1.foreach(println(_, Thread.currentThread.getName))
s2.foreach(println(_, Thread.currentThread.getName))
-
打印结果
-
(0,main)
(1,main)
(2,main)
(3,main)
(4,main)
(5,main)
(6,main)
(7,main)
(8,main)
(9,main)
(0,main)
(1,main)
(2,main)
(3,main)
(4,main)
(5,main)
(6,main)
(7,main)
(8,main)
(9,main)
(0,scala-execution-context-global-12)
(5,scala-execution-context-global-14)
(1,scala-execution-context-global-18)
(5,scala-execution-context-global-15)
(6,scala-execution-context-global-14)
(0,scala-execution-context-global-13)
(2,scala-execution-context-global-16)
(7,scala-execution-context-global-17)
(1,scala-execution-context-global-13)
(6,scala-execution-context-global-15)
(2,scala-execution-context-global-13)
(3,scala-execution-context-global-13)
(4,scala-execution-context-global-13)
(7,scala-execution-context-global-19)
(8,scala-execution-context-global-18)
(9,scala-execution-context-global-18)
(4,scala-execution-context-global-16)
(3,scala-execution-context-global-12)
(9,scala-execution-context-global-19)
(8,scala-execution-context-global-15)
以上是关于大数据(7h)比较Python和Scala的数据容器的主要内容,如果未能解决你的问题,请参考以下文章
大数据(7i)比较Python和Scala的collection常用方法