有16个符号的可能序列数有一些限制

Posted

技术标签:

【中文标题】有16个符号的可能序列数有一些限制【英文标题】:Number of possible sequences of 16 symbols are there given some restrictions 【发布时间】:2020-08-19 09:53:33 【问题描述】:

可以形成多少个符合以下规则的可能序列:

    每个序列都由符号0-9a-f组成。

    每个序列的长度正好是 16 个符号。

    0123456789abcdef    ok
    0123456789abcde     XXX
    0123456789abcdeff   XXX
    

    符号可以重复,但不超过 4 次。

    00abcdefabcdef00    ok
    00abcde0abcdef00    XXX
    

    一个符号不能连续出现三次。

    00abcdefabcdef12    ok
    000bcdefabcdef12    XXX
    

    最多可以有两对。

    00abcdefabcdef11    ok
    00abcde88edcba11    XXX
    

另外,生成所有这些需要多长时间?

【问题讨论】:

评论不用于扩展讨论;这个对话是moved to chat。 【参考方案1】:

在组合学中,计数通常非常简单,并且可以比穷举生成每个备选方案(或更糟糕的是,穷举生成可能性超集以过滤它们)更快地完成。一种常见的技术是将给定问题简化为少量(ish)不相交的子问题的组合,其中可以查看每个子问题对总数的贡献次数。这种分析通常会产生动态规划解决方案,或者如下所示,产生记忆递归解决方案。

因为组合结果通常是巨大的数字,所以暴力生成每种可能性,即使对于每个序列都可以非常快速地完成,除了最微不足道的情况外,在所有情况下都是不切实际的。例如,对于这个特定的问题,我在评论中做了粗略的粗略估计(已删除):

有 18446744073709551616 个可能的 64 位(16 个十六进制数字)数字,这是一个非常大的数字,大约 180 亿。因此,如果我能够每秒生成并测试其中一个,那将花费我 180 亿秒,或大约 571 年。因此,通过访问由 1000 台 96 核服务器组成的集群,我可以在大约 54 小时内完成所有工作,仅仅两天多一点。亚马逊将以每小时不到 1 美元的价格向我出售一台 96 核服务器(现货价格),因此 54 小时使用 1000 台服务器将花费不到 50,000 美元。或许这在情理之中。 (但这只是为了生成。)

毫无疑问,最初的问题是探索通过破解密码来尝试每个可能的序列的可能性的一部分,并且实际上没有必要精确计算可能的密码数量来证明不切实际该方法(或其对于预算足以支付必要计算资源的组织的实用性)。正如上面的估计所示,如果它所保护的内容足够有价值,那么具有 64 位熵的密码并不是真的那么安全。在为您珍藏的物品生成密码时要考虑到这一点。

不过,计算精确的组合计数还是很有趣的,除了智力挑战之外没有其他原因。

以下主要是概念验证;我用 Python 编写它是因为 Python 提供了一些在 C 中重现和调试会很耗时的工具:带有元组键的哈希表和任意精度整数运算。它可以用 C 重写(或者更容易用 C++),而且 Python 代码肯定会得到改进,但考虑到在原始问题中计算计数请求只需要 70 毫秒,似乎没有必要这样做。

这个程序仔细地将可能的序列分组到不同的分区中,并将结果缓存在一个记忆表中。对于长度为 16 的序列,和 OP 一样,缓存最终有 2540 个条目,这意味着核心计算只进行了 2540 次:

# The basis of the categorization are symbol usage vectors, which count the
# number of symbols used (that is, present in a prefix of the sequence)
# `i` times, for `i` ranging from 1 to the maximum number of symbol uses
# (4 in the case of this question). I tried to generalise the code for different
# parameters (length of the sequence, number of distinct symbols, maximum
# use count, maximum number of pairs). Increasing any of these parameters will,
# of course, increase the number of cases that need to be checked and thus slow
# the program down, but it seems to work for some quite large values. 

# Because constantly adjusting the index was driving me crazy, I ended up
# using 1-based indexing for the usage vectors; the element with index 0 always
# has the value 0. This creates several inefficiencies but the practical
# consequences are insignificant.

### Functions to manipulate usage vectors

def noprev(used, prevcnt):
    """Decrements the use count of the previous symbol"""
    return used[:prevcnt] + (used[prevcnt] - 1,) + used[prevcnt + 1:]
def bump1(used, count):
    """Registers that one symbol (with supplied count) is used once more."""
    return ( used[:count]
           + (used[count] - 1, used[count + 1] + 1)
           + used[count + 2:]
           )
def bump2(used, count):
    """Registers that one symbol (with supplied count) is used twice more."""
    return ( used[:count]
           + (used[count] - 1, used[count + 1], used[count + 2] + 1)
           + used[count + 3:]
           )
def add1(used):
    """Registers a new symbol used once."""
    return (0, used[1] + 1) + used[2:]
def add2(used):
    """Registers a new symbol used twice."""
    return (0, used[1], used[2] + 1) + used[3:]

def count(NSlots, NSyms, MaxUses, MaxPairs):
    """Counts the number of sequences of length NSlots over an alphabet
       of NSyms symbols where no symbol is used more than MaxUses times,
       no symbol appears three times in a row, and there are no more than 
       MaxPairs pairs of symbols.
    """
    cache = 

    # Canonical description of the problem, used as a cache key
    #   pairs: the number of pairs in the prefix
    #   prevcnt: the use count of the last symbol in the prefix
    #   used: for i in [1, NSyms], the number of symbols used i times
    #         Note: used[0] is always 0. This problem is naturally 1-based
    def helper(pairs, prevcnt, used):
        key = (pairs, prevcnt, used)
        if key not in cache:
            # avail_slots: Number of remaining slots.
            avail_slots = NSlots - sum(i * count for i, count in enumerate(used))
            if avail_slots == 0:
                total = 1
            else:
                # avail_syms:  Number of unused symbols.
                avail_syms = NSyms - sum(used)
                # We can't use the previous symbol (which means we need
                # to decrease the number of symbols with prevcnt uses).
                adjusted = noprev(used, prevcnt)[:-1]
                # First, add single repeats of already used symbols
                total = sum(count * helper(pairs, i + 1, bump1(used, i))
                            for i, count in enumerate(adjusted)
                            if count)
                # Then, a single instance of a new symbol
                if avail_syms:
                    total += avail_syms * helper(pairs, 1, add1(used))

                # If we can add pairs, add the already not-too-used symbols
                if pairs and avail_slots > 1:
                    total += sum(count * helper(pairs - 1, i + 2, bump2(used, i))
                                 for i, count in enumerate(adjusted[:-1])
                                 if count)
                    # And a pair of a new symbol
                    if avail_syms:
                        total += avail_syms * helper(pairs - 1, 2, add2(used))
            cache[key] = total
        return cache[key]

    rv = helper(MaxPairs, MaxUses, (0,)*(MaxUses + 1))
    #   print("Cache size: ", len(cache))
    return rv

# From the command line, run this with the command:
# python3 SLOTS SYMBOLS USE_MAX PAIR_MAX
# There are defaults for all four argument.
if __name__ == "__main__":
    from sys import argv
    NSlots, NSyms, MaxUses, MaxPairs = 16, 16, 4, 2
    if len(argv) > 1: NSlots   = int(argv[1])
    if len(argv) > 2: NSyms    = int(argv[2])
    if len(argv) > 3: MaxUses  = int(argv[3])
    if len(argv) > 4: MaxPairs = int(argv[4])
    print (NSlots, NSyms, MaxUses, MaxPairs,
           count(NSlots, NSyms, MaxUses, MaxPairs))

这是使用该程序计算所有有效序列的计数的结果(因为在约束条件下不可能出现超过 64 的序列),总共耗时不到 11 秒:

$ time for i in $(seq 1 65); do python3 -m count $i 16 4; done
1 16 4 2 16
2 16 4 2 256
3 16 4 2 4080
4 16 4 2 65040
5 16 4 2 1036800
6 16 4 2 16524000
7 16 4 2 263239200
8 16 4 2 4190907600
9 16 4 2 66663777600
10 16 4 2 1059231378240
11 16 4 2 16807277588640
12 16 4 2 266248909553760
13 16 4 2 4209520662285120
14 16 4 2 66404063202640800
15 16 4 2 1044790948722393600
16 16 4 2 16390235567479693920
17 16 4 2 256273126082439298560
18 16 4 2 3992239682632407024000
19 16 4 2 61937222586063601795200
20 16 4 2 956591119531904748877440
21 16 4 2 14701107045788393912922240
22 16 4 2 224710650516510785696509440
23 16 4 2 3414592455661342007436384000
24 16 4 2 51555824538229409502827923200
25 16 4 2 773058043102197617863741843200
26 16 4 2 11505435580713064249590793862400
27 16 4 2 169863574496121086821681298457600
28 16 4 2 2486228772352331019060452730124800
29 16 4 2 36053699633157440642183732148192000
30 16 4 2 517650511567565591598163978874476800
31 16 4 2 7353538304042081751756339918288153600
32 16 4 2 103277843408210067510518893242552998400
33 16 4 2 1432943471827935940003777587852746035200
34 16 4 2 19624658467616639408457675812975159808000
35 16 4 2 265060115658802288611235565334010714521600
36 16 4 2 3527358829586230228770473319879741669580800
37 16 4 2 46204536626522631728453996238126656113459200
38 16 4 2 595094456544732751483475986076977832633088000
39 16 4 2 7527596027223722410480884495557694054538752000
40 16 4 2 93402951052248340658328049006200193398898022400
41 16 4 2 1135325942092947647158944525526875233118233702400
42 16 4 2 13499233156243746249781875272736634831519281254400
43 16 4 2 156762894800798673690487714464110515978059412992000
44 16 4 2 1774908625866508837753023260462716016827409668608000
45 16 4 2 19556269668280714729769444926596793510048970792448000
46 16 4 2 209250137714454234944952304185555699000268936613376000
47 16 4 2 2169234173368534856955926000562793170629056490849280000
48 16 4 2 21730999613085754709596718971411286413365188258316288000
49 16 4 2 209756078324313353775088590268126891517374425535395840000
50 16 4 2 1944321975918071063760157244341119456021429461885104128000
51 16 4 2 17242033559634684233385212588199122289377881249323872256000
52 16 4 2 145634772367323301463634877324516598329621152347129008128000
53 16 4 2 1165639372591494145461717861856832014651221024450263064576000
54 16 4 2 8786993110693628054377356115257445564685015517718871715840000
55 16 4 2 61931677369820445021334706794916410630936084274106426433536000
56 16 4 2 404473662028342481432803610109490421866960104314699801413632000
57 16 4 2 2420518371006088374060249179329765722052271121139667645435904000
58 16 4 2 13083579933158945327317577444119759305888865127012932088217600000
59 16 4 2 62671365871027968962625027691561817997506140958876900738150400000
60 16 4 2 259105543035583039429766038662433668998456660566416258886520832000
61 16 4 2 889428267668414961089138119575550372014240808053275769482575872000
62 16 4 2 2382172342138755521077314116848435721862984634708789861244239872000
63 16 4 2 4437213293644311557816587990199342976125765663655136187709235200000
64 16 4 2 4325017367677880742663367673632369189388101830634256108595793920000
65 16 4 2 0

real    0m10.924s
user    0m10.538s
sys     0m0.388s

【讨论】:

我用 C、C++、... 以及最近的 Python 编程。我喜欢 Python 的地方在于它的表现力和简洁性,这当然是由于您所谈论的基本基础设施的可用性。我只是认为——尤其是对于 POC——最好没有水平滚动条。因此,感谢您进行的调整。 (我不是故意唠叨)【参考方案2】:

这个程序统计了 16,390,235,567,479,693,920 个密码。

#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>


enum  RLength = 16 ;  //  Required length of password.
enum  NChars = 16 ;   //  Number of characters in alphabet.


typedef struct

    /*  N[i] counts how many instances of i are left to use, as constrained
        by rule 3.
    */
    unsigned N[NChars];

    /*  NPairs counts how many more pairs are allowed, as constrained by
        rule 5.
    */
    unsigned NPairs;

    /*  Used counts how many characters have been distinguished by choosing
        them as a represenative.  Symmetry remains unbroken for NChars - Used
        characters.
    */
    unsigned Used;
 Supply;


/*  Count the number of passwords that can be formed starting with a string
    (in String) of length Length, with state S.
*/
static uint64_t Count(int Length, Supply *S, char *String)

    /*  If we filled the string, we have one password that obeys the rules.
        Return that.  Otherwise, consider suffixing more characters.
    */
    if (Length == RLength)
        return 1;

    //  Initialize a count of the number of passwords beginning with String.
    uint64_t C = 0;

    //  Consider suffixing each character distinguished so far.
    for (unsigned Char = 0; Char < S->Used; ++Char)
    
        /*  If it would violate rule 3, limiting how many times the character
            is used, do not suffix this character.
        */
        if (S->N[Char] == 0) continue;

        //  Does the new character form a pair with the previous character?
        unsigned IsPair = String[Length-1] == Char;

        if (IsPair)
        
            /*  If it would violate rule 4, a character may not appear three
                times in a row, do not suffix this character.
            */
            if (String[Length-2] == Char) continue;

            /*  If it would violate rule 5, limiting how many times pairs may
                appear, do not suffix this character.
            */
            if (S->NPairs == 0) continue;

            /*  If it forms a pair, and our limit is not reached, count the
                pair.
            */
            --S->NPairs;
        

        //  Count the character.
        --S->N[Char];

        //  Suffix the character.
        String[Length] = Char;

        //  Add as many passwords as we can form by suffixing more characters.
        C += Count(Length+1, S, String);

        //  Undo our changes to S.
        ++S->N[Char];
        S->NPairs += IsPair;
    

    /*  Besides all the distinguished characters, select a representative from
        the pool (we use the next unused character in numerical order), count
        the passwords we can form from it, and multiply by the number of
        characters that were in the pool.
    */
    if (S->Used < NChars)
    
        /*  A new character cannot violate rule 3 (has not been used 4 times
            yet, rule 4 (has not appeared three times in a row), or rule 5
            (does not form a pair that could pass the pair limit).  So we know,
            without any tests, that we can suffix it.
        */

        //  Use the next unused character as a representative.
        unsigned Char = S->Used;

        /*  By symmetry, we could use any of the remaining NChars - S->Used
            characters here, so the total number of passwords that can be
            formed from the current state is that number times the number that
            can be formed by suffixing this particular representative.
        */
        unsigned Multiplier = NChars - S->Used;

        //  Record another character is being distinguished.
        ++S->Used;

        //  Decrement the count for this character and suffix it to the string.
        --S->N[Char];
        String[Length] = Char;

        //  Add as many passwords as can be formed by suffixing a new character.
        C += Multiplier * Count(Length+1, S, String);

        //  Undo our changes to S.
        ++S->N[Char];
        --S->Used;
    

    //  Return the computed count.
    return C;



int main(void)

    /*  Initialize our "supply" of characters.  There are no distinguished
        characters, two pairs may be used, and each character may be used at
        most 4 times.
    */
    Supply S =  .Used = 0, .NPairs = 2 ;
    for (unsigned Char = 0; Char < NChars; ++Char)
        S.N[Char] = 4;

    /*  Prepare space for string of RLength characters preceded by a sentinel
        (-1).  The sentinel permits us to test for a repeated character without
        worrying about whether the indexing goes outside array bounds.
    */
    char String[RLength+1] =  -1 ;

    printf("There are %" PRIu64 " possible passwords.\n",
        Count(0, &S, String+1));

【讨论】:

性能提示:我的笔记本电脑(i7-4720HQ CPU,2.60GHz)需要 156 秒 @Wolf:我进行了一些更改以使其更快,特别是在原始 S 上进行操作,而不是保留副本并在递归后仅通过撤消对其所做的更改来恢复其原始状态。这比之前的代码快了大约四倍。 @Wolf:所以它每秒计算 2.73e17 个密码。 ? 我必须更正时间(忘记从Debug切换到Release构建):140s(旧版本),24.5s(新版本),快了5倍以上:-) == 6.67e17 PW/s @Wolf:大部分时间都花在了递归的叶子上。如果将例程更改为像现在一样处理Length &lt; RLength-1,并且特别是替代(Length == RLength-1)(使用1代替调用Count,并且结构S指向的更改可以是省略),时间下降超过 50%。我认为一些组合分析可能会进一步降低它。【参考方案3】:

存在的可能性数量是固定的。您可以提出一种算法来生成有效组合,也可以只遍历整个问题空间并使用检查组合有效性的简单函数检查每个组合。

需要多长时间,取决于计算机和效率。您可以轻松地使其成为多线程应用程序。

【讨论】:

突然间我在编写那个应用程序的编码方面不是那么好,会很难做到吗? 首先编写一个接受单个字符串作为参数的函数。将所有检查放入函数中。现在您可以为任何给定的字符串调用该函数,因此您需要一种方法来生成所有可能的字符串。提示:如果您只是遍历从 0 到 MAX_UINT64 的所有数字,并将它们从十进制转换为十六进制...

以上是关于有16个符号的可能序列数有一些限制的主要内容,如果未能解决你的问题,请参考以下文章

WIN7的IIS对并发连接数有限制么

uint8和uint16 是啥数据类型?

uint8是8位无符号整型,uint16是16位无符号整型。

uint是几位的整型啊,uint16是几位的?

MapGuessing TopCoder - 12152 (dfs容斥)

uint是几位无符号整数?