克服 Python 字典的时间复杂度

Posted 2023-03-29

技术标签:

【中文标题】克服 Python 字典的时间复杂度【英文标题】：Defeating the time complexity of Python Dictionary 【发布时间】：2021-12-21 20:11:09 【问题描述】：

我有一个 Python 字典，它的键是由小写英文字母组成的字符串，值是整数。此外，恰好有 5e6 个唯一键，它们都是长度正好为 10 的字符串。令人惊讶的是，查找并没有花费太多时间。我预计执行大约需要 4 秒或更长时间，但不会超过 2.5 秒。

我将 Python 代码转换为 C++，Dictionary 的类比是 map。我尝试了map、unordered_map 和gp_hash_table，而它们都在 C++ 中花费了超过 2 秒。

这是我用来生成唯一字符串的生成器。

from sys import stdout

def increment(l):
    n = len(l)
    i = n - 1
    while i >= 0 and l[i] == 'z':
        l[i] = 'a'
        i -= 1
    l[i] = chr(ord(l[i]) + 1)

print(5 * 10**6)

string = ['a' for i in range(10)]


for i in range(5 * 10**6):
    stdout.write(''.join(string) + '\n')
    increment(string)

string = ['a' for i in range(10)]

print(10**6)
for i in range(5 * 10**6):
    stdout.write(''.join(string) + '\n')
    increment(string)

该程序的输出使用命令python3 Test.py > Strings.txt. 存储在名为Strings.txt 的文件中，之后文件Strings.txt 将如下所示。

5000000
aaaaaaaaaa
aaaaaaaaab
aaaaaaaaac
aaaaaaaaad
aaaaaaaaae
aaaaaaaaaf
...
...
...
aaaaakymlr
1000000
aaaaaaaaaa
aaaaaaaaab
aaaaaaaaac
...
...
...
aaaaakymlq
aaaaakymlr

以下是我在上述上下文中所指的 CPP 代码。

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
using namespace __gnu_pbds;

int main() 
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int N = 0;
    cin >> N;
    map<string, int> freq;
    // unordered_map<string, int> freq;
    // gp_hash_table<string, int> freq;
    for(int i = 0; i < N; i++) 
        string s;
        cin >> s;
        freq[s]++;
    
    int Q = 0;
    cin >> Q;
    for(int i = 0; i < Q; i++) 
        string s;
        cin >> s;
        cout << freq[s] << '\n';
    
    return 0;

以下是我使用的Python3代码。

from collections import defaultdict
from sys import stdin, stdout

input = stdin.readline

freq = defaultdict(int)
for i in range(int(input())):
    freq[input()] += 1

for i in range(int(input())):
    stdout.write(str(freq[input()]) + '\n')

这是执行上述代码时的结果。

suman@Skynet:~/Documents/String_Pairs$ python3 Test.py > Strings.txt

suman@Skynet:~/Documents/String_Pairs$ time python3 Dict.py < Strings.txt > P_out.txt

real    0m3.145s
user    0m2.662s
sys     0m0.164s
suman@Skynet:~/Documents/String_Pairs$ time python3 Dict.py < Strings.txt > P_out.txt

real    0m2.772s
user    0m2.568s
sys     0m0.204s


suman@Skynet:~/Documents/String_Pairs$ g++ -o exe Map.cpp -O2 -std=c++17
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.346s
user    0m2.265s
sys     0m0.080s
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.513s
user    0m2.417s
sys     0m0.096s


suman@Skynet:~/Documents/String_Pairs$ g++ -o exe Unordered_Map.cpp -O2 -std=c++17
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.769s
user    0m2.660s
sys     0m0.108s
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.806s
user    0m2.690s
sys     0m0.116s


suman@Skynet:~/Documents/String_Pairs$ g++ -o exe gp_hash_table.cpp -O2 -std=c++17
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.099s
user    0m1.686s
sys     0m0.412s
suman@Skynet:~/Documents/String_Pairs$ time ./exe < Strings.txt > Cpp_out.txt

real    0m2.009s
user    0m1.605s
sys     0m0.404s
suman@Skynet:~/Documents/String_Pairs$

现在，我唯一担心的是 Python3 比 CPP 慢 5 倍，但在处理哈希表时它仍然在与 CPP 竞争。

有什么方法可以克服 Python 哈希表的时间复杂度？

非常感谢任何帮助。

更新： 这是一个更新，不包括读取字符串所花费的时间。

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
using namespace __gnu_pbds;
using namespace chrono;

int main() 
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int N = 0;
    cin >> N;

    vector<string> words(N);

    for(int i = 0; i < N; i++) 
        cin >> words[i];
    

    // map<string, int> freq;
    // unordered_map<string, int> freq;
    gp_hash_table<string, int> freq;

    auto start = high_resolution_clock::now();
    for(string word : words) 
        freq[word]++;
    

    auto end = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(end - start);
    cout << duration.count() / 1e6 << '\n';



    int Q = 0;
    cin >> Q;
    vector<string> queries(Q);
    for(int i = 0; i < Q; i++) 
        cin >> queries[i];
    

    vector<int> results(Q);
    start = high_resolution_clock::now();
    for(int i = 0; i < Q; i++) 
        results[i] = freq[queries[i]];
    
    end = high_resolution_clock::now();
    duration = duration_cast<microseconds>(end - start);
    cout << duration.count() / 1e6 << '\n';

    for(int i = 0; i < Q; i++) 
        cout << results[i] << '\n';
    

    return 0;

Python 代码

from collections import defaultdict
from time import time
from sys import stdin, stdout

input = stdin.readline

freq = defaultdict(int)

strings = []

for i in range(int(input())):
    strings.append(input())

start = time()
for string in strings:
    freq[string] += 1
end = time()
print("%.4f" %(end - start))

queries = []
output = []

for i in range(int(input())):
    queries.append(input())

start = time()

for query in queries:
    output.append(freq[query])

end = time()

print("%.4f" %(end - start))

stdout.write('\n'.join(map(str, output)))

即使是现在，Python 的运行速度也比 CPP 快。结果：

Cpp_out.txt（map、unordered_map 和 gp_hash_table 所用的时间都大于 2s）。

2.28297
0.109844
1
1
...
...
...

P_out.txt

1.7818
0.1977
1
1
...
...

更新 2： 我已经修改了代码，因此我不包括阅读或写作所花费的时间以及在任何地方使用引用。现在，CPP 在散列中击败 Python3 是可以接受的。以下是基准。

// CPP Code
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
using namespace __gnu_pbds;
using namespace chrono;

struct djb2 
    unsigned long operator()(const string& str) const 
        unsigned long hash = 5381;
        for (auto c : str)
            hash = ((hash << 5) + hash) + c;
        return hash;
    
;


int main() 
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    int N = 0;
    cin >> N;

    vector<string> words(N);

    for(int i = 0; i < N; i++) 
        cin >> words[i];
    

    // map<string, int> freq;
    // unordered_map<string, int> freq;
    gp_hash_table<string, int> freq;

    auto start = high_resolution_clock::now();
    for(const string &word : words) 
        freq[word]++;
    

    auto end = high_resolution_clock::now();
    auto duration = duration_cast<microseconds>(end - start);
    cout << duration.count() / 1e6 << '\n';



    int Q = 0;
    cin >> Q;
    vector<string> queries(Q);
    for(int i = 0; i < Q; i++) 
        cin >> queries[i];
    

    vector<int> results(Q);
    start = high_resolution_clock::now();
    for(int i = 0; i < Q; i++) 
        results[i] = freq[queries[i]];
    
    end = high_resolution_clock::now();
    duration = duration_cast<microseconds>(end - start);
    cout << duration.count() / 1e6 << '\n';

    for(int i = 0; i < Q; i++) 
        cout << results[i] << '\n';
    

    return 0;

# Python3 Code
from collections import defaultdict
from time import time
from sys import stdin, stdout

input = stdin.readline

freq = defaultdict(int)

strings = []

for i in range(int(input())):
    strings.append(input())

start = time()
for string in strings:
    freq[string] += 1
end = time()
print("%.4f" %(end - start))

queries = []
output = []

for i in range(int(input())):
    queries.append(input())

start = time()

for query in queries:
    output.append(freq[query])

end = time()

print("%.4f" %(end - start))

stdout.write('\n'.join(map(str, output)))

Cpp_out.txt

1.60026
0.071471

P_out.txt

1.7849
0.1987

所以，很明显CPP's gp_hash_table 胜过Python3's 哈希表。

我已经完成了 Python3 的哈希表实现。他们正在使用一种叫做 SIPHASH 的东西。我想生成字符串，以使对字符串进行哈希处理时的冲突次数最大。这有点像碰撞攻击，但我希望至少 5000 美元的唯一字符串产生相同的哈希。

谁能为此提供任何类型的资源（请注意，我需要至少 5000 美元的唯一字符串来产生相同的哈希值）。

【问题讨论】：

Unordered_Map.cpp 包含什么内容？ @kiner_shah 与Map.cpp 包含相同的脚本，唯一的区别是使用unordered_map<string, int> 而不是其他两个（已注释）。您将读取和写入的时间计算在内 > 50 兆字节。结果与哈希表性能基本无关。 @ChitturiSaiSuman 您更新的代码按值迭代string 项目，这意味着它必须进行动态分配并复制所有内容。你应该做for (const string &word : words)（或const auto &word）你应该预先设置你的 C++ 映射，即this。 【参考方案1】：

注释中指出的reserve 调用对于unordered_map 来说是最重要的。此外，您应该使用unordered_map 的正确实现（不是 g++ 的默认实现，VS 2019 还可以）并考虑您使用的架构。

在我的计算机上使用 Microsoft 工具（Python37_64 和 VS 2019） - 除最后几行外使用 x86：

Python 1.697 0.202 原始 C++ 映射：1.361 0.138 原始 C++ unordered_map：1.035 0.067 +const 字符串&: 1.013 0.067 +储备：0.675 0.056 在 x64 上改为：0.686 0.060（几乎相同）

在添加保留之前，在 x86 上使用 robin_hood（如 @thc 所建议的那样）并没有更快 - 即使这样，对于 gcc 的 unordered_map 来说差异也没有那么显着：

robin_hood unordered_map x86: 1.107 0.100 +const 字符串& x86: 1.059 0.101 +reserve x86: 0.477 0.108

但是，如果您在 x64 上运行，robin_hood 的构建速度也更快，但查找速度更快（并且保留仍然很重要）：

原始 VS C++ unordered_map x64: 1.130 0.065 +const 字符串& x64: 1.119 0.063 +reserve x64: 0.671 0.065 robin_hood unordered_map x64: 0.611 0.069 +const 字符串& x64: 0.577 0.072 +reserve x64: 0.384 0.069 原 g++ C++ unordered_map x64: 1.639 0.135 +const 字符串& x64: 1.611 0.135 +reserve x64: 0.946 0.128

代码片段（请注意，保留在公平的时间范围内很重要）：

auto start = high_resolution_clock::now();
unordered_map<string, int> freq;
freq.reserve(Q);
for(const string& word : words) 
    freq[word]++;

auto end = high_resolution_clock::now();

补充：'freq' 也有一个取消分配的成本（并且 robin_hood 在那里似乎更快）。只是构建地图的一小部分，但仍然很重要。

【讨论】：

【参考方案2】：

我自己找到了这篇文章的答案。 Python 使用随机散列算法，这使得生成两个产生相同散列值的字符串（由完全小写或完全大写的字符组成）几乎是不可能的。

现在，谈到 C++ 与 Python3 的性能问题，我也考虑了读取和写入所花费的时间，因为它显示 C++ 比 Python3 慢。

当我纠正代码并确保在读/写或分配中没有额外的开销（在 C++ 的情况下，我使用 const string& 来引用映射条目），我发现在 @ 时 C++ 的运行速度比 Python3 快使用 987654322@。

感谢所有花时间/精力解决我的问题的人。

【讨论】：

【参考方案3】：

可能是这个 for 循环中的复制结构：

for(string word : words) 
    freq[word]++;

“字符串词”每次迭代都会复制一个新字符串。考虑更改为按引用访问：

for(const string& word : words) 
    freq[word]++;

【讨论】：

以上是关于克服 Python 字典的时间复杂度的主要内容，如果未能解决你的问题，请参考以下文章