join和+的区别
Posted sandy-1128
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了join和+的区别相关的知识,希望对你有一定的参考价值。
连接字符串的时候可以用join也可以用+,但这两者有没有区别呢?
我们先来看一下用join和+连接字符串的例子
str1 = " ".join(["hello", "world"]) str2 = "hello " + "world" print(str1) # 输出 “hello world" print(str2) # 输出 “hello world"
两者的结果是一样,那么考虑这样一个问题,这两者在性能上有区别吗?
我们来做个实验,比较下join和+的性能
import timeit,time def test1(strlist): return "".join(strlist) def test2(strlist): result = "" for v in strlist: result = result+v return result if __name__ == "__main__": strlist = ["a very very very very very very very long string" for n in range(1000)] timer1 = timeit.Timer("test1(strlist)", "from __main__ import strlist, test1") timer2 = timeit.Timer("test2(strlist)", "from __main__ import strlist, test2") time1 = timer1.timeit(number=100) time2 = timer2.timeit(number=100) print("join: %f, plus: %f" % (time1, time2)) strlist1 = ["very very very long long" ,"very long long long","very long long long","very long long long","very long long long"] time1 = time.time() for i in range(100000): test1(strlist1) time2 = time.time() time3 = time.time() for i in range(100000): test2(strlist1) time4 = time.time() print ("join:%s" %(time2-time1)) print("+ :%s" % (time4-time3))
输出:
join: 0.003507, plus: 0.083788 join:0.18189620971679688 + :0.3727850914001465
可以看到,join的性能明显好于+。这是为什么呢?
原因是这样的,字符串是不可变对象,当用操作符+连接字符串的时候,每执行一次+都会申请一块新的内存,因此用+连接字符串的时候会涉及好几次内存申请和复制。而join在连接字符串的时候,会先计算需要多大的内存存放结果,然后一次性申请所需内存并将字符串复制过去,这是为什么join的性能优于+的原因。所以在连接字符串数组的时候,我们应考虑优先使用join
以上是关于join和+的区别的主要内容,如果未能解决你的问题,请参考以下文章
spark关于join后有重复列的问题(org.apache.spark.sql.AnalysisException: Reference '*' is ambiguous)(代码片段