位掩码选择向量/集元素？

Posted 2023-03-31

技术标签:

【中文标题】位掩码选择向量/集元素？【英文标题】：Bit Masking to select Vector/Set Elements? 【发布时间】：2016-12-29 05:17:52 【问题描述】：

目标

我正在尝试使用容器（例如向量、集合等）提取元素，但我没有使用索引，而是使用位掩码技术。

场景：

vector<string> alphabets "a", "b", "c", "d", "e";

测试用例：

输入：5（等效位掩码：00101）

输出：新向量"c", "e"

输入13（位掩码：01101）

输出向量："b", "c", "e"

朴素的工作解决方案：

vector<string*> extract(int mask)
    vector<string*> result;
    bitset<n> bits(mask);
    for (int j = 0; j < n; ++j) 
        if (bits[j])
            result.push_back(&alphabets[j]);

进一步改进

时间复杂度空间复杂度概念/想法？？ API 可用性？？

示例用例。

置换 a,b,c,d,e 的所有组合，从而将 a,b,c,d,e 包裹在一个容器上。（Generating combinations in c++ 问题中提到了其他方法。）

#include <vector>
#include <iostream>
#include <algorithm>
#include <string>
#include <bitset>

using namespace std;


int main()
    const int n = 5;

    vector<string> alphabets "a", "b", "c", "d", "e";

    for ( int i = 0; i < pow(2, n); ++i)
        vector<string*> result;
        bitset<n> bits(i);
        for (int j = 0; j < n; ++j) 
            if (bits[j])
                result.push_back(&alphabets[j]);
            
        

        for (auto r: result)
            cout << *r;
        
        cout << endl;
    
    return 0;

【问题讨论】：

问题是什么？我首先对性能有点担心，因为我正在尝试解决一个需要我蛮力所有组合的 NP 完全问题的小问题集。因此，解决这个问题只是解决了我更大问题的一小部分。（即有一个更好的置换所有组合的解决方案） Related. 您可以使用库函数来获取排列，这通常比您自己的实现更快：std::next_permutation 我不确定您是否真的需要变量 result 来解决您的小中间问题。 bitset<n> 和alphabet 类型的变量bits 具有result 的信息内容。例如，bitset 的联合比 vector<string*> 的联合效率高得多（它可以使用无符号位或“|”运算符）。 【参考方案1】：

如果您更喜欢性能而不是可读性，我认为这是一个合理的起点。

尝试 1

基本上我会避免任何内存分配。

#include <string>
#include <vector>
#include <bitset>
#include <iostream>
#include <iterator>
#include <tuple>
#include <array>


template<class From, std::size_t N>
auto
select(From const& from, std::bitset<N> const& bits)

    std::array<const std::string*, N> result  nullptr ;
    auto i = std::begin(result);
    std::size_t found;
    std::size_t count = found = bits.count();
    std::size_t index = 0;
    while (count)
    
        if (bits.test(index)) 
            *i++ = &from[index];
            --count;
        
        ++index;
    
    return std::make_tuple(found, result);


int main()


    std::vector<std::string> alphabet =  "a", "b", "c", "d", "e", "f", "g", "h" ;

    for (unsigned x = 0 ; x < 256 ; ++x)
    
        auto info = select(alphabet, std::bitset<8>(x));
        auto ptrs = std::get<1>(info).data();
        auto size = std::get<0>(info);
        while(size--)
        
            std::cout << *(*ptrs++) << ", ";
        

        std::cout << '\n';

尝试 2 - 运行时查找以纳秒为单位...

在这里，我在编译时预先计算所有可能的字母表。

运行时间当然快得令人眼花缭乱。但是，一个多于 14 个字符的字母可能需要一段时间才能编译...

更新：警告！当我将字母表大小设置为 16 以消耗 32GB 内存时，停止桌面上的所有其他应用程序并需要重新启动我的 macbook，然后才能执行其他任何操作。您已收到警告。

#include <string>
#include <vector>
#include <bitset>
#include <iostream>
#include <iterator>
#include <tuple>
#include <array>


template<class From, std::size_t N>
auto
select(From const& from, std::bitset<N> const& bits)

    std::array<const std::string*, N> result  nullptr ;
    auto i = std::begin(result);
    std::size_t found;
    std::size_t count = found = bits.count();
    std::size_t index = 0;
    while (count)
    
        if (bits.test(index)) 
            *i++ = &from[index];
            --count;
        
        ++index;
    
    return std::make_tuple(found, result);


template<std::size_t Limit>
struct alphabet

    constexpr alphabet(std::size_t mask)
    : size(0)
    , data  
    
        for (std::size_t i = 0 ; i < Limit ; ++i)
        
            if (mask & (1 << i))
                data[size++] = char('a' + i);
        
    

    std::size_t size;
    char data[Limit];

    friend decltype(auto) operator<<(std::ostream& os, alphabet const& a)
    
        auto sep = "";
        for (std::size_t i = 0 ; i < a.size; ++i)
        
            std::cout << sep << a.data[i];
            sep = ", ";
        
        return os;
    
;

template<std::size_t Limit>
constexpr alphabet<Limit> make_iteration(std::size_t mask)

    alphabet<Limit> result  mask ;
    return result;


template<std::size_t Limit, std::size_t...Is>
constexpr auto make_iterations(std::index_sequence<Is...>)

    constexpr auto result_space_size = sizeof...(Is);
    std::array<alphabet<Limit>, result_space_size> result
    
        make_iteration<Limit>(Is)...
    ;
    return result;


template<std::size_t Limit>
constexpr auto make_iterations()

    return make_iterations<Limit>(std::make_index_sequence<std::size_t(1 << Limit) - 1>());


int main()

    static constexpr auto alphabets = make_iterations<8>();
    for(const auto& alphabet : alphabets)
    
        std::cout << alphabet << std::endl;

尝试 3

使用指向匹配元素的指针的非常基本的固定容量容器。我添加了 constexpr。这不会改善大多数情况，但肯定会改善静态分配的选择。

#include <vector>
#include <bitset>
#include <iostream>
#include <iterator>
#include <tuple>
#include <array>

namespace quick_and_dirty 

template<class T, std::size_t Capacity>
struct fixed_capacity_vector

    using value_type = T;

    constexpr fixed_capacity_vector()
    : store_()
    , size_(0)
    

    constexpr auto push_back(value_type v)
    
        store_[size_] = std::move(v);
        ++ size_;
    

    constexpr auto begin() const  return store_.begin(); 
    constexpr auto end() const  return begin() + size_; 

private:
    std::array<T, Capacity> store_;
    std::size_t size_;
;

 // namespace quick_and_dirty

template<class From, std::size_t N>
constexpr
auto
select(From const& from, std::bitset<N> const& bits)

    using value_type = typename From::value_type;
    using ptr_type = std::add_pointer_t<std::add_const_t<value_type>>;

    auto result = quick_and_dirty::fixed_capacity_vector<ptr_type, N>();

    std::size_t count = bits.count();

    for (std::size_t index = 0 ; count ; ++index)
    
        if (bits.test(index)) 
            result.push_back(&from[index]);
            --count;
        
    
    return result;


int main()


    std::vector<std::string> alphabet =  "a", "b", "c", "d", "e", "f", "g", "h" ;

    for (unsigned x = 0 ; x < 256 ; ++x)
    
        for(auto p : select(alphabet, std::bitset<8>(x)))
        
            std::cout << (*p) << ", ";
        

        std::cout << '\n';

【讨论】：

【参考方案2】：

好吧，那么您的示例可以更短（更短一点，它可能无法证明您真正需要什么）（编辑：我计划将字符直接输出到cout，而不是使用@987654322 @vector，然后我就忘记了，当时我正在重写代码......所以它仍然和你的相似）：

#include <vector>
#include <iostream>
#include <string>

int main() 
    const std::vector<std::string> alphabets "a", "b", "c", "d", "e";
    const unsigned N = alphabets.size();
    const unsigned FULL_N_MASK = 1 << N;

    for (unsigned mask = 0; mask < FULL_N_MASK; ++mask) 

        std::vector<const std::string*> result;
        unsigned index = 0, test_mask = 1;
        while (index < N) 
            if (mask & test_mask) result.push_back(&alphabets[index]);
            ++index, test_mask <<= 1;
        

        for (auto r : result) std::cout << *r;
        std::cout << std::endl;
    
    return 0;

这让我有点奇怪，为什么你需要从掩码中提取 result 向量，也许你可以只使用 mask 本身，并在内部循环中获取特定的字符串（就像我在我正在构建result)。

我的代码中的主要变化是省略了bitset<n> bits(i); 初始化，因为您已经在i 中拥有这些位，可以使用 C 语言按位运算符 (<< >> & ^ | ~) 轻松访问，无需将它们转换为再次与bitset 相同。

bitset 的使用是有意义的，如果n 太大而无法将掩码放入某些常规类型（如uint32_t）。即使这样，对于相当小的固定n，我也可能会使用一些 64/128/256b 无符号整数（您的目标平台上可用的整数）。

关于速度：你当然无法击败++mask。 std::next_permutation 将比单个本地机器代码指令慢，即使它以相同的方式实现 32/64 以下的大小。

但问题是，如果您可以围绕该位掩码构建算法，以有效利用该优势。

许多国际象棋引擎使用各种棋盘值的位掩码编码，以轻松检查某些字段是否被占用，或者一步进行一些初步的回合可用性检查，例如获取所有可以在下一回合中占据对手数字的棋子：@ 987654337@ - 5 个原生 CPU 位运算符，如果结果为 0，你知道你的 pawn 不能攻击任何东西。或者你有可以拿走的对手棋子的位掩码。与幼稚地循环遍历所有棋子，检查对手棋子的 [+-1, +1] 字段并将它们添加到一些动态分配的内存中，例如vector。

或者是否存在能够修剪组合树并提前退出的算法，完全跳过一些重要的排列子集，这很可能比完整的++mask扫描更快。

【讨论】：

好吧，终于明白你的技术了，我们的算法还是差不多的，只是你直接执行位掩码。并且 test_mask 是我内部 for 循环的替代品。但是，是的，我同意这更好。在时间复杂度方面，我们的算法仍然执行相同的时间复杂度。但你的速度更快。 @Yeo：是的，我无法改进算法，这是未知的。如果您想要完整的排列，这是非常理想的（有问题的位是向量中的结果，而不是排列的计算）。这可能比预先计算所有排列并存储在内存中更快，因为位掩码和字母仅使用非常有限的内存，不会过多地破坏缓存。但是超过 90% 的 CPU 时间将花费在您的真实算法上，该算法正在处理这些排列。所以你有点过早地优化了东西，可能是在错误的地方。

以上是关于位掩码选择向量/集元素？的主要内容，如果未能解决你的问题，请参考以下文章

通过权限位掩码从 MySQL 数据库中选择用户？

递归函数，使用位掩码c ++显示集合的所有子集

有效地找到匹配位掩码的第一个元素

以下程序中的位掩码用法来自 Programming Pearls

PHP中基于位掩码获取数组值

这个按位汉明（31,26）编码器如何在 C 中工作？（位掩码）