大数据(7h)比较Python和Scala的数据容器

Posted 小基基o_O

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据(7h)比较Python和Scala的数据容器相关的知识,希望对你有一定的参考价值。

Scala数据容器介绍

关键词🔉直译理解
iterableadj. 可迭代的可重复(从迭代器或容器中)取出数据
collectionkəˈlekʃnn. 采集;[税收] 征收;收藏品数据容器
mutableˈmjuːtəbladj. 可变的可变的数据容器
immutableɪˈmjuːtəbladj. 不可变的不可变的数据容器
  1. collection继承Iterable特质
  2. collection分为mutable(可变)和immutable(不可变)
  3. mutable:原对象可增删
  4. immutable:原对象不可增删,增删将返回新对象
  5. 大数据场景下,immutable更常用
import scala.collection.immutable
import scala.collection.mutable

Set(set)

不可变

s1 = {1, 2}
s2 = s1 | {2, 3}  # 并集
s3 = s2 & {3, 4}  # 交集
val s1 = Set(1, 2)
val s2 = s1 | Set(2, 3) //并集
val s3 = s2 & Set(3, 4) //交集

可变

s = {1, 2}
s.add(3)  # {1, 2, 3}
import scala.collection.mutable
val s = mutable.Set(1, 2)
s.add(3) // Set(1, 2, 3)

Map(dict)

不可变

d = {'a': 1, 'b': 2}
# 访问
print(d['b'])
print(d.get('c', 99))
# 遍历
for kv in d.items(): print(kv)
# 查看keys
print(d.keys())  # dict_keys(['a', 'b'])
# 查看values
print(d.values())  # dict_values([1, 2])
val d = Map("a"->1,"b"->2)  //打印:Map(a -> 1, b -> 2)
// 访问
println(d.get("b"))  // Some(2)
println(d.getOrElse("c",99))  // 99
// 遍历
for (kv <- d) {println(kv)}
// 查看keys
println(d.keys)  // Set(a, b)
// 查看values
println(d.values)  // Iterable(1, 2)

可变

d = {'a': 1, 'b': 2}
del d['a']
d['c'] = 33
val d = mutable.Map("a"->1,"b"->2)
// 删数据
d.-=("a") //HashMap(b -> 2)
// 修改或添加数据
d.update("c",33) //HashMap(b -> 2, c -> 33)

List(list)

不可变

l0 = [1, 2, 'c']
l1 = l0 + [4, 4]  # 拼接
l2 = l1[2:]  # 截取
print(l1)  # [1, 2, 'c', 4, 4]
print(l2)  # ['c', 4, 4]
val l0 = List(1,2,"c")
val l1 = l0.appendedAll(List(4,4))  // 拼接
val l2 = l1.drop(2)  // 截取
println(l1)  // List(1, 2, c, 4, 4)
println(l2)  // List(c, 4, 4)

可变

ls = [1, 2, 3]
ls[0] = 9
ls.insert(0, 99)
val ls = ListBuffer(1,2,3)
// 修改
ls(0) = 9  //ListBuffer(9, 2, 3)
// 插入
ls.insert(0, 99)  //ListBuffer(99, 9, 2, 3)

元组

t = ('a', (2, 'c'))
print(t[0])
print(t[1][1])
val t = ("a",(2,"c"))
for (i <- t.productIterator) println(i)  // 遍历
println(t)  // (a,(2,c))
println(t.productElement(0))  // a
println(t._1)  // a
println(t._2._2)  // c

Range(range)

range(2, 5)
val r = Range(2,5)
println(r)  // Range 2 until 5
r.foreach(print)  // 234

Array

不可变

import numpy as np
# 创建【 np.array([0, 0, 0]) 】
a = np.zeros(3)
# 遍历
for i in a:
    print(i)
# 修改元素
a[0] = 99
print(a[0])
# 拼接
aa = np.concatenate([a, np.array([2, 2])], 0)
print(aa)  # [99.  0.  0.  2.  2.]
// 创建【val a:Array[Int] = Array(0,0,0)】
val a:Array[Int] = new Array[Int](3)
// 遍历
for (elem <- a) println(elem)
// 修改元素【a.update(0,99)】
a(0) = 99
println(a(0))
// 拼接
val aa = a.concat(Array(2,2))
println(aa.mkString("-"))  // 99-0-0-2-2

可变

import numpy as np
a = np.array([2, 3, 4, 5, 6])
a = np.insert(a, 0, 99)  # 插入
a = np.append(a, [7, 8])  # 尾加
a = np.delete(a, range(2, 5))  # 移除
print(a)  # [99  2  6  7  8]
import scala.collection.mutable.ArrayBuffer
val a = ArrayBuffer(2,3,4,5,6)
a.insert(0,99)  // 插入
a.appendAll(ArrayBuffer(7,8))  // 尾加
a.remove(2,3)  // 移除
println(a)  // ArrayBuffer(99, 2, 6, 7, 8)

队列

import scala.collection.mutable
val q:mutable.Queue[Int] = mutable.Queue(1,2,3)
println(q.enqueue(99))  // Queue(1, 2, 3, 99)
println(q.dequeue())  // 1
println(q)  // Queue(2, 3, 99)

并行collection

import scala.collection.parallel.immutable
val s1 = Seq(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
val s2 = immutable.ParSeq(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
s1.foreach(println(_, Thread.currentThread.getName))
s2.foreach(println(_, Thread.currentThread.getName))
打印结果
(0,main)
(1,main)
(2,main)
(3,main)
(4,main)
(5,main)
(6,main)
(7,main)
(8,main)
(9,main)
(0,main)
(1,main)
(2,main)
(3,main)
(4,main)
(5,main)
(6,main)
(7,main)
(8,main)
(9,main)
(0,scala-execution-context-global-12)
(5,scala-execution-context-global-14)
(1,scala-execution-context-global-18)
(5,scala-execution-context-global-15)
(6,scala-execution-context-global-14)
(0,scala-execution-context-global-13)
(2,scala-execution-context-global-16)
(7,scala-execution-context-global-17)
(1,scala-execution-context-global-13)
(6,scala-execution-context-global-15)
(2,scala-execution-context-global-13)
(3,scala-execution-context-global-13)
(4,scala-execution-context-global-13)
(7,scala-execution-context-global-19)
(8,scala-execution-context-global-18)
(9,scala-execution-context-global-18)
(4,scala-execution-context-global-16)
(3,scala-execution-context-global-12)
(9,scala-execution-context-global-19)
(8,scala-execution-context-global-15)

以上是关于大数据(7h)比较Python和Scala的数据容器的主要内容,如果未能解决你的问题,请参考以下文章

大数据(7j)比较Python和Scala的yield

大数据(7c)比较Python和Scala的流程控制

大数据(7i)比较Python和Scala的collection常用方法

R,Python,Scala 和 Java,到底该使用哪一种大数据编程语言

关于这场Python 和Scala的较量,你怎么看?

大数据的框架与特点