Python: collections模块实例透析

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python: collections模块实例透析相关的知识,希望对你有一定的参考价值。

Collections模块

collections模块提供了一些python内置数据类型的扩展,比如OrderedDictdefaultdictnamedtupledequecounter等,简单实用,非常值得学习了解。

import collections

1. OrderedDict

顾名思义,有顺序的词典,次序不再是随机的。普通的dict不记录插入的顺序,遍历其值的时候是随机的,相反,OrderedDict记录插入的顺序,在迭代的时候可以看出差异。

遍历

print Regular dictionary:
d = {}
d[a] = A
d[b] = B
d[c] = C

for key, value in d.items():
    print key, value

Regular dictionary:
a A
c C
b B

print OrderedDict:
d = collections.OrderedDict()
d[a] = A
d[b] = B
d[c] = C

for key, value in d.items():
    print key, value

OrderedDict:
a A
b B
c C

相等比较

比较两个词典是否相等,普通词典比较只看内容,内容相同即判定相等为真;而OrderedDict同时会考虑顺序,item被添加的顺序。

print dict       :,
d1 = {}
d1[a] = A
d1[b] = B
d1[c] = C

d2 = {}
d2[b] = B
d2[a] = A
d2[c] = C

print d1 == d2

dict       : True

print OrderedDict:,
d1 = collections.OrderedDict()
d1[a] = A
d1[b] = B
d1[c] = C

d2 = collections.OrderedDict()
d2[b] = B
d2[a] = A
d2[c] = C

print d1 == d2

OrderedDict: False

2. defaultdict

普通词典,当你访问没有的键值时,会抛出异常,用defaultdict,可以预先给定默认值,尤其默认值是需要做累积或聚合操作的时候(比如计数)。defaultdict接受一个参数default_factory,该函数负责返回特定的值,可以自定义,也可以用list(返回[ ]) set(返回set())int(返回0),直接上例子说的比较清楚。

defaultdict其实是继承dict类后。添加了__missing__(key)方法,用于处理KeyError异常。

def default_factory():
    return This is default string value
d = collections.defaultdict(default_factory)
print d[foo]

This is default string value

这里没有定义d[‘foo‘],但是可以访问,并返回值。下面看点更厉害的!

list

default_factory设定为list可以方便地把一系列键值对group起来。默认会返回空的list,下面例子把相同的键group在一起。

s = [(yellow, 1), (blue, 2), (yellow, 3), (blue, 4), (red, 1)]
d = collections.defaultdict(list)
for k, v in s:
    d[k].append(v)
    # simpler and faster than d.setdefault(k, []).append(v)
d.items()

[(blue, [2, 4]), (red, [1]), (yellow, [1, 3])]

int

计数的时候特别方便,比如要统计每个键值出现多少次。

s = [(yellow, 1), (blue, 2), (yellow, 3), (blue, 4), (red, 1)]
d = collections.defaultdict(int)
for k, v in s:
    d[k] += 1
d.items()

[(blue, 2), (red, 1), (yellow, 2)]

s = mississippi
d = collections.defaultdict(int)
for k in s:
    d[k] += 1
d.items()

[(i, 4), (p, 2), (s, 4), (m, 1)]

set

list功能类似,但返回set(),剔除了重复元素。

s = [(red, 1), (blue, 2), (red, 3), (blue, 4), (red, 1), (blue, 4)]
d = collections.defaultdict(set)
for k, v in s:
    d[k].add(v)
d.items()

[(blue, {2, 4}), (red, {1, 3})]

3. namedtuple

默认的tuple是用数字做索引的,而namedtuple是可以按名字访问,对fields很多,或者创建和使用场景离得比较远的情况,比较有用。

bob = (Bob, 30, male)
print Representation:, bob

jane = (Jane, 29, female)
print \nField by index:, jane[0]

print \nFields by index:
for p in [ bob, jane ]:
    print %s is a %d year old %s % p

Representation: (Bob, 30, male)

Field by index: Jane

Fields by index:
Bob is a 30 year old male
Jane is a 29 year old female

由于不同的nametuple不一样,我们要单独定义,同时按name访问(依然可以按数字访问)。

# define namedtuple
Person = collections.namedtuple(Person,name age gender)

print Type of Person:, type(Person)
bob = Person(name=Bob, age=30, gender=male)
print \nRepresentation:, bob

bob = Person(Bob,30,male) # also supported
print Representation:, bob

jane = Person(name=Jane, age=29, gender=female)
print \nField by name:, jane.name
print Field by name:, jane[0]

Type of Person: <type type>

Representation: Person(name=Bob, age=30, gender=male)
Representation: Person(name=Bob, age=30, gender=male)

Field by name: Jane
Field by name: Jane

4. deque

double-ended queue,双向队列,支持任何一侧的addremove操作。普通的stackqueuedeque的退化形式。

当然,deque依然是sequence,所以一些列表类似的操作也是支持的。

d = collections.deque(abcdefg)
print Deque:, d
print Length:, len(d)
print Left end:, d[0]
print Right end:, d[-1]

d.remove(c)
print remove(c), d

Deque: deque([a, b, c, d, e, f, g])
Length: 7
Left end: a
Right end: g
remove(c) deque([a, b, d, e, f, g])

populating

往队列push元素

import collections

# Add to the right
d = collections.deque()
d.extend(abcdefg) # append with elements from the iterable
print extend    :, d
d.append(h)
print append    :, d

# Add to the left
d = collections.deque()
d.extendleft(abcdefg)
print extendleft:, d
d.appendleft(h)
print appendleft:, d

extend    : deque([a, b, c, d, e, f, g])
append    : deque([a, b, c, d, e, f, g, h])
extendleft: deque([g, f, e, d, c, b, a])
appendleft: deque([h, g, f, e, d, c, b, a])

consuming

从双向队列pop元素。

print From the right:
d = collections.deque(abcdefg)
while True:
    try:
        print d.pop(),
    except IndexError:
        break

From the right:
g f e d c b a

print \nFrom the left:
d = collections.deque(abcdefg)
while True:
    try:
        print d.popleft(),
    except IndexError:
        break

From the left:
a b c d e f g

5. Counter

计数器,顾名思义。构造器接受以下形式,实现初始化。

print collections.Counter([a, b, c, a, b, b])
print collections.Counter({a:2, b:3, c:1})
print collections.Counter(a=2, b=3, c=1)

Counter({b: 3, a: 2, c: 1})
Counter({b: 3, a: 2, c: 1})
Counter({b: 3, a: 2, c: 1})

update

c = collections.Counter()
print Initial :, c

c.update(abcdaab)
print Sequence:, c

c.update({a:1,d:5}) # increse not replace
print Dict    :, c # add to a and d

Initial : Counter()
Sequence: Counter({a: 3, b: 2, c: 1, d: 1})
Dict    : Counter({d: 6, a: 4, b: 2, c: 1})

访问

访问时候利用和字典一样的API。但对于没有的键,不会抛出异常,而是计数为0。

c = collections.Counter(abcdaab)
for letter in abcde:
    print %s : %d % (letter, c[letter])

a : 3
b : 2
c : 1
d : 1
e : 0

elements

产生包含所有元素的一个迭代器。

c = collections.Counter(China)
c[z] = 0
print c
print list(c.elements())

Counter({a: 1, C: 1, i: 1, h: 1, n: 1, z: 0})
[a, C, i, h, n]

most_common()

返回前n个最常见的。

c = collections.Counter(abcdaab)
c.most_common(2)

[(a, 3), (b, 2)]

 

以上是关于Python: collections模块实例透析的主要内容,如果未能解决你的问题,请参考以下文章

Python collections模块总结

Python学习——列表操作全透析

(Python第九天)Collections模块

python标准库之collections

Python其他数据结构collection模块-namtuple defaultdict deque Queue Counter OrderDict

Python collections.defaultdict() 与 dict的使用和区别