将所有数字分解为给定数字的快速算法

Posted 2023-02-22

技术标签:

【中文标题】将所有数字分解为给定数字的快速算法【英文标题】：Fast Algorithm to Factorize All Numbers Up to a Given Number 【发布时间】：2020-07-02 14:07:56 【问题描述】：

我正在寻找一种算法，它可以根据已经分解的数字分解数字。换句话说，我正在寻找一种快速算法来将所有数字分解为给定数字，并将它们存储在一个（我猜这是最容易使用的数据结构）列表/元组中。我正在寻找一种“最多 n”的算法，因为我需要最多为“n”的所有数字，而且我想这比逐个检查要快。

对于我正在运行的程序，我希望该算法在合理的时间内（不到一小时）工作 2*10^8。我在python中尝试了一种更天真的方法，首先找到所有素数到“n”，然后对于每个数字“k”，通过检查每个素数直到一个素数除以它（我们称之为p），找到它的素数分解，那么它的因式分解就是k/p + p的因式分解。

from math import *
max=1000000 # We will check all numbers up to this number, 

lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
  if lst[i - 2]:
    for j in range(i ** 2, max, i):
      lst[j - 2] = False

PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")

FACTORS = [(0,),(1,)] #This will be a list of tuples where FACTORS[i] = the prime factors of i
for c in range(2,max): #check all numbers until max
  if c in PRIMES:
    FACTORS.append((c,)) #If it's a prime just add it in
  else: #if it's not a prime...
    i=0
    while PRIMES[i]<= c: #Run through all primes until you find one that divides it,
      if c%PRIMES[i] ==0: 
        FACTORS.append(FACTORS[c//PRIMES[i]] + (PRIMES[i],)) #If it does, add the prime with the factors of the division
        break
      i+=1

在测试中，绝大多数时间都浪费在了检查候选人是否为素数之后的 else 部分。这需要超过我们的 for max = 200000000

附注- 我用这个做什么 - 不重要

我正在运行的程序是找到最小的“n”，使得对于某个“a”，使得 (2n)!/((n+a)!^2) 是一个整数。基本上，我定义了 a_n = 最小 k 使得 (2k)!/((k+n)!^2) 是一个整数。结果，a_1 = 0，a_2 = 208，a_3 = 3475，a_4 = 8174，a_5 = 252965，a_6 = 3648835，a_7 = 72286092。顺便说一句，我注意到 a_n + n 是无平方的，虽然无法证明数学上。使用勒让德公式：https://en.wikipedia.org/wiki/Legendre%27s_formula，我写了这段代码：

from math import *
from bisect import bisect_right
max=100000000 # We will check all numbers up to this number, 

lst = [True] * (max - 2) # This is an algorithm I found online that will make the "PRIMES" list all the primes up to "max", very efficent
for i in range(2, int(sqrt(max) + 1)):
  if lst[i - 2]:
    for j in range(i ** 2, max, i):
      lst[j - 2] = False

PRIMES = tuple([m + 2 for m in range(len(lst)) if lst[m]]) # (all primes up to "max")
print("START")

def v(p,m):
  return sum([ (floor(m/(p**i))) for i in range(1,1+ceil(log(m,p)))]) #This checks for the max power of prime p, so that p**(v(p,m)) divides factorial(m)

def check(a,n): #This function checks if a number n competes the criteria for a certain a
  if PRIMES[bisect_right(PRIMES, n)]<= n + a: #First, it is obvious that if there is a prime between n+1 and n+a the criteria isn't met
    return False
  i=0
  while PRIMES[i] <= n: #We will run through the primes smaller than n... THIS IS THE ROOM FOR IMPROVEMENT - instead of checking all the primes, check all primes that divide (n+1),(n+2),...,(n+a)
    if v(PRIMES[i],2*n)<2*v(PRIMES[i],n+a): # If any prime divides the denominator more than the numerator, the fraction is obviously not a whole number
      return False
    i+=1
  return True #If for all primes less than n, the numerator has a bigger max power of p than the denominator, the fraction is a whole number.

#Next, is a code that will just make sure that the program runs all numbers in order, and won't repeat anything.

start = 0 #start checking from this value
for a in range(1,20): #check for these values of a.
  j=start
  while not check(a,j):
    if j%100000==0:
      print("LOADING ", j) #just so i know how far the program has gotten.
    j+=1
  print("a-",a," ",j) #We found a number. great. print the result.
  start=j #start from this value again, because the check obviously won't work for smaller values with a higher "a"

【问题讨论】：

【参考方案1】：

您可以使用脚本的第一部分来执行此操作！

代码：

from math import *
import time

MAX = 40000000

t = time.time()
# factors[i] = all the prime factors of i
factors = 
# Running over all the numbers smaller than sqrt(MAX) since they can be the factors of MAX
for i in range(2, int(sqrt(MAX) + 1)):
    # If this number has already been factored - it is not prime
    if i not in factors:
        # Find all the future numbers that this number will factor
        for j in range(i * 2, MAX, i):
            if j not in factors:
                factors[j] = [i]
            else:
                factors[j].append(i)
print(time.time() - t)

for i in range(3, 15):
    if i not in factors:
        print(f"i is prime")
    else:
        print(f"i: factors[i]")

结果：

3：是素数 4：[2] 5：是素数 6：[2, 3] 7：是素数 8：[2] 9：[3] 10: [2, 5] 11：是素数 12：[2, 3] 13：是素数 14: [2, 7]

解释：

正如 cmets 中提到的，它是对 Sieve of Eratosthenes 算法的修改。对于每个数字，我们找到它在未来可以分解的所有数字。如果数字未出现在结果字典中，则它是素数，因为没有数字将其分解。我们使用字典而不是列表，因此根本不需要保存素数 - 这对内存更友好但也更慢。

时间：

根据MAX = 40000000 和time.time() 的简单检查：110.14351892471313 秒。对于MAX = 1000000：1.0785243511199951 秒。对于MAX = 200000000 和time.time()：1.5 小时后未完成......它已到达主循环中 6325 个项目中的第 111 个项目（这还不错，因为循环越远它们变得越短）。

不过，我确实相信编写良好的 C 代码可以在半小时内完成（如果您愿意考虑，我可能会再写一个答案）。可以做的更多优化是使用多线程和一些 Primality 测试，如 Miller-Rabin。当然值得一提的是，这些结果是在我的笔记本电脑上，可能在 PC 或专用机器上运行得更快或更慢。

【讨论】：

哇！这太聪明了。这个算法有名字吗？您的范围限制有问题。如果我将 MAX 设置为小于 25 的值，例如24，那么它不会显示 5 是 10 的质因数或 7 是 14 的因数，但它会返回元素 10 到 23 的结果。这是对埃拉托色尼筛子的轻微修改。需要注意的是，找到大数的因数很难成为某些密码标准的基础，因此“快”将是相对的。 @Booboo 是的，你是对的，我会修复它。通常问题是它不会在sqrt(MAX) 上方显示任何素数，但这可以通过检查factors[i] 是否为空而不是素数来解决，您需要将i 添加到它。 @Lainad 如果您对此解决方案感到满意，欢迎您批准它。如前所述，它确实是对Sieve of Eratosthenes 的修改。【参考方案2】：

编辑：

我实际上向question in code review 询问了这个答案，它有一些关于运行时的酷图！

编辑#2：

有人回答了我的问题，现在代码经过一些修改可以在 2.5 秒内运行。

由于之前的答案是用Python 写的，所以速度很慢。以下代码执行完全相同的操作，但在 C++ 中，它有一个线程监控它每 10 秒得到哪个素数。

#include <math.h>
#include <unistd.h>
#include <list>
#include <vector>
#include <ctime>
#include <thread>
#include <iostream>
#include <atomic>

#ifndef MAX
#define MAX 200000000
#define TIME 10
#endif


std::atomic<bool> exit_thread_flagfalse;

void timer(int *i_ptr) 
    for (int i = 1; !exit_thread_flag; i++) 
        sleep(TIME);
        if (exit_thread_flag) 
            break;
        
        std::cout << "i = " << *i_ptr << std::endl;
        std::cout << "Time elapsed since start: " 
                  << i * TIME 
                  << " Seconds" << std::endl;
    


int main(int argc, char const *argv[])

    int i, upper_bound, j;
    std::time_t start_time;
    std::thread timer_thread;
    std::vector< std::list< int > > factors;

    std::cout << "Initiallizating" << std::endl;
    start_time = std::time(nullptr);
    timer_thread = std::thread(timer, &i);
    factors.resize(MAX);
    std::cout << "Initiallization took " 
              << std::time(nullptr) - start_time 
              << " Seconds" << std::endl;

    std::cout << "Starting calculation" << std::endl;
    start_time = std::time(nullptr);
    upper_bound = sqrt(MAX) + 1;
    for (i = 2; i < upper_bound; ++i)
    
        if (factors[i].empty())
        
            for (j = i * 2; j < MAX; j += i)
            
                factors[j].push_back(i);
            
        
    
    std::cout << "Calculation took " 
              << std::time(nullptr) - start_time 
              << " Seconds" << std::endl;

    // Closing timer thread
    exit_thread_flag = true;

    std::cout << "Validating results" << std::endl;
    for (i = 2; i < 20; ++i)
    
        std::cout << i << ": ";
        if (factors[i].empty()) 
            std::cout << "Is prime";
         else 
            for (int v : factors[i]) 
                std::cout << v << ", ";
            
        
        std::cout << std::endl;
    
    
    timer_thread.join();
    return 0;

需要用以下行编译：

g++ main.cpp -std=c++0x -pthread

如果您不想将整个代码转换为 C++，您可以使用 Python 中的子进程库。

时间：

我尽力了，但它仍然运行了一个多小时......它在 1.386111 小时（4990 秒）内达到了6619，这是第 855 个素数（好多了！）。所以这是一个进步，但还有一段路要走！（如果没有其他线程可能会更快）

【讨论】：

以上是关于将所有数字分解为给定数字的快速算法的主要内容，如果未能解决你的问题，请参考以下文章