使用 Python 从列表和字典构建数组

Posted 2023-02-23

技术标签:

【中文标题】使用 Python 从列表和字典构建数组【英文标题】：Build An Array from a list and a dictionnary with Python 【发布时间】：2015-09-30 02:40:44 【问题描述】：

我正在尝试用一个列表构建一个矩阵，然后用 dict 的值填充它。它适用于小数据，但当使用更大的数据（内存不足）时计算机会崩溃。我的脚本显然太重了，但我不知道如何改进它（第一次编程）。谢谢

import numpy as np
liste = ["a","b","c","d","e","f","g","h","i","j"]

dico = "a/b": 4, "c/d" : 2, "f/g" : 5, "g/h" : 2

#now i'd like to build a square array (liste x liste) and fill it up with the values of
# my dict.


def make_array(liste,dico):
    array1 = []
    liste_i = [] #each line of the array
    for i in liste:
        if liste_i :
            array1.append(liste_i)
            liste_i = []
        for j in liste:
            if dico.has_key(i+"/"+j): 
                liste_i.append(dico[i+"/"+j])
            elif dico.has_key(j+"/"+i):
                liste_i.append(dico[j+"/"+i])
            else :
                liste_i.append(0)
    array1.append(liste_i)
    print array1
    matrix = np.array(array1)
    print matrix.shape()
    print matrix
    return matrix
    
make_array(liste,dico)

非常感谢您的回答，使用in dico 或列表推导确实提高了脚本的速度，这非常有帮助。但似乎我的问题是由以下功能引起的：

def clustering(matrix, liste_globale_occurences, output2):
    most_common_groups = []
    Y = scipy.spatial.distance.pdist(matrix)
    Z = scipy.cluster.hierarchy.linkage(Y,'average', 'euclidean')
    scipy.cluster.hierarchy.dendrogram(Z)
    clust_h = scipy.cluster.hierarchy.fcluster(Z, t = 15, criterion='distance')
    print clust_h
    print len(clust_h)
    most_common = collections.Counter(clust_h).most_common(3)
    group1 = most_common[0][0]
    group2 = most_common[1][0]
    group3 = most_common[2][0]
    most_common_groups.append(group1)
    most_common_groups.append(group2)
    most_common_groups.append(group3)
    with open(output2, 'w') as results: # here the begining of the problem 
        for group in most_common_groups: 
            for i, val in enumerate(clust_h):
                if group == val:
                    mise_en_page = "0:36s groupe co-occurences = 1:5s \n"
                    results.write(mise_en_page.format(str(liste_globale_occurences[i]),str(val)))

当使用小文件时，我会得到正确的结果，例如：

联系 a = groupe 2

联系人 b = groupe 2

联系人 c = groupe 2

联系人 d = groupe 2

联系 e = groupe 3

联系 f = groupe 3

但是当使用大量文件时，我每组只能得到一个示例：

联系 a = groupe 2

联系 a = groupe 2

联系 a = groupe 2

联系 a = groupe 2

联系 e = groupe 3

联系 e = groupe 3

【问题讨论】：

您能否详细解释一下用列表构建矩阵，然后用dict 的值填充它。？也许展示一个最小的例子！不要使用has_key它在2.7中被弃用并在3中删除，使用in dico 【参考方案1】：

您可以创建一个零矩阵 mat=len(liste)*len(liste) 并检查您的 dico 和拆分键：'/' 之前的 val 将是行数，而 '/' 之后的 val 将是行数柱子。这样就不需要使用'has_key'搜索功能了。

【讨论】：

【参考方案2】：

您的问题看起来像一个 O(n²) 因为您想从 liste 中获取所有组合。所以你必须有一个内部循环。

您可以尝试做的是将每一行写入一个文件，然后在一个新进程中从文件创建矩阵。新进程将使用更少的内存，因为它不必存储liste 和dico 的大量输入。所以是这样的：

def make_array(liste,dico):
    f = open('/temp/matrix.txt', 'w')
    for i in liste:
        for j in liste:
            # This is just short circuit evaluation of logical or. It gets the first value that's not nothing
            f.write('%s ' % (dico.get(i+"/"+j) or dico.get(j+"/"+i) or 0))
        f.write('\n')
    f.close()
    return

然后，一旦执行完毕，您就可以调用

print np.loadtxt('/temp/matrix.txt', dtype=int)

我已使用短路评估来减少您的 if 语句的代码行数。事实上，如果您使用list comprehensions，您可以将您的make_array 函数简化为：

def make_array(liste,dico):
    return np.array([[dico.get(i+"/"+j) or dico.get(j+"/"+i) or 0 for j in liste] for i in liste])

【讨论】：

以上是关于使用 Python 从列表和字典构建数组的主要内容，如果未能解决你的问题，请参考以下文章