C++ 引用不一致

Posted 2023-02-16

技术标签:

【中文标题】C++ 引用不一致【英文标题】：C++ reference inconsistency 【发布时间】：2020-12-18 13:00:56 【问题描述】：

我正在使用 yaml-cpp 库来解析 yaml。缩略样本：

YAML::Node def = YAML::LoadFile(defFile);
for (auto itemPair = def.begin(); itemPair != def.end(); ++itemPair) 
    // Grab a reference so `itemPair->second` doesn't need to be copied all over the place
    auto& item = itemPair->second;

    // A few instances of the below in series
    if (item["key"].IsDefined())  doSomething(item["key"].as<std::string>()); 

    // Problem happens here
    if (item["issue"].IsDefined()) 
        if (!item["issue"].IsMap())  continue; 
        for (auto x = item["issue"].begin(); x != item["issue"].end(); ++x) 
            LOG(INFO) << "Type before: " << item.Type() << " : " << itemPair->second.Type();
            auto test = x->first.as<std::string>();
            LOG(INFO) << "Type after: " << item.Type() << " : " << itemPair->second.Type();
            // Using item as a map fails because it no longer is one!
            // Next loop attempt also crashes when it attempts to use [] on item.

问题发生在嵌套循环中，在 sn-p 开头获取的引用突然改变，但它引用的变量似乎不受影响：

I1218 12:44:04.697798 296012 main.cpp:123] Type before: 4 : 4
I1218 12:44:04.697813 296012 main.cpp:125] Type after: 2 : 4

我对引用的理解是它们充当另一个变量的别名。我知道 yaml 库可能会在幕后做一些改变基础数据的魔术，但我无法理解为什么引用似乎正在更新但原始值仍然存在。

编辑：这里正在发生一些严重的令人兴奋的行为。在对itemPair->second.Type() 进行任何调用后，引用都会“重置”回正确的值。因此，如果我添加另一个日志调用：

LOG(INFO) << "Type after: " << item.Type() << " : " << itemPair->second.Type();
LOG(INFO) << "Type afterer: " << item.Type() << " : " << itemPair->second.Type();

结果：

I1218 12:58:59.965732 297648 main.cpp:123] Type before: 4 : 4
I1218 12:58:59.965752 297648 main.cpp:125] Type after: 2 : 4
I1218 12:58:59.965766 297648 main.cpp:126] Type afterer: 4 : 4

可重现的例子：

test.yaml:

---
one:
    key: x
    issue:
        first: 1
two:
    key: y
    issue:
        first: 1
        second: 2

main.cpp 与上述相同，但将硬编码的test.yaml、LOG 替换为std::cout，以及模拟函数：

#include <iostream>
#include <yaml-cpp/yaml.h>

void doSomething(std::string x)  std::cout << "Got key: " << x << std::endl; 

int main() 
    YAML::Node def = YAML::LoadFile("test.yaml");
    for (auto itemPair = def.begin(); itemPair != def.end(); ++itemPair) 
        // Grab a reference so `itemPair->second` doesn't need to be copied all over the place
        auto& item = itemPair->second;

        // A few instances of the below in series
        if (item["key"].IsDefined())  doSomething(item["key"].as<std::string>()); 

        // Problem happens here
        if (item["issue"].IsDefined()) 
            if (!item["issue"].IsMap())  continue; 
            for (auto x = item["issue"].begin(); x != item["issue"].end(); ++x) 
                std::cout << "Type before: " << item.Type() << " : " << itemPair->second.Type() << std::endl;
                auto test = x->first.as<std::string>();
                std::cout << "Type after: " << item.Type() << " : " << itemPair->second.Type() << std::endl;
                std::cout << "Type afterer: " << item.Type() << " : " << itemPair->second.Type() << std::endl;
                // Using item as a map fails because it no longer is one!
                // Next loop attempt also crashes when it attempts to use [] on item.

结果：

$ ./build/out
Got key: x
Type before: 4 : 4
Type after: 2 : 4
Type afterer: 4 : 4
Got key: y
Type before: 4 : 4
Type after: 2 : 4
Type afterer: 4 : 4
Type before: 4 : 4
Type after: 2 : 4
Type afterer: 4 : 4

【问题讨论】：

我不知道YAML，所以我帮不上什么忙，但这闻起来像UB，需要minimal reproducible example。您是否仅使用此代码即可获得该输出？它似乎是最流行的 c++ yaml 解析器。我很乐意使用其他东西，但我认为选择非常有限。 Sanitizer 在item["key"] 告诉我AddressSanitizer: stack-use-after-scope。这很奇怪。 YAML::Node 具有引用语义，因此使用 auto item 而不是 auto& item 不会进行不必要的复制，并且可能会解决问题。我正在查看库源代码，operator-> 应用于 iterator 似乎正在创建一个 proxy 类型的临时对象（复制其值），然后 second 指到它的成员变量（inherited from std::pair）。存储对此成员变量的引用不会阻止临时对象被销毁。请注意，iterator 的 operator* 也会创建存储值的副本。 【参考方案1】：

Node 设计用于保存引用，迭代器的行为类似于指向std::pair<Node, Node> 的指针，并将返回一个临时的Node。如果你绑定到那个Node，你将绑定到一个销毁的Node。所以这里需要一份副本。将auto& 改为auto 即可解决问题。

它是这样设计的，因为它不希望你触及下面的内存。否则在重新分配内存时可能会产生悬空引用。

悬空引用示例：

std::vector<int> v1;
auto &ref1 = v[0];

v.reserve(100); // reallocating, causing ref1 a dangling reference.

另外，我写了为什么它是这样设计的。看这里： https://github.com/jbeder/yaml-cpp/issues/977#issuecomment-771041297 我就复制到这里。

为什么这里引用的是UB。

当使用-> 时，迭代器iter 在堆栈上创建一个临时解引用结果，返回其指针，并在作用域后立即销毁该对象。

这是为了使iter->second 的行为与(*iter).second 相似。

如果将 deref 结果放在堆上，则很难决定何时销毁该对象。

预期行为与(*iter).second 相同。但是(*iter).second 是一个右值，编译器不允许auto&。 iter->second 中的情况并非如此，因为编译器将 iter->second 视为左值。

C++ 标准使指针表达式的内置成员p->m 成为lvalue。所以没有办法禁止绑定到引用。

总之，当行为是正确的时候

V list = iter->second;   // correct
V &list = iter->second;  // wrong
V &&list = iter->second; // COMPILE TIME ERROR
V &&list = std::move(iter->second); // still wrong

auto list = iter -> second;   // correct, list is V
auto &list = iter -> second;  // wrong,   list is V&
auto &&list = iter -> second; // wrong,   list is V&

V list = (*iter).second;   // correct
V &list = (*iter).second;  // COMPILE TIME ERROR
V &&list = (*iter).second; // correct

auto list = (*iter).second;   // correct, list is V
auto &list = (*iter).second;  // COMPILE TIME ERROR
auto &&list = (*iter).second; // correct, list is V&&

以下是作者的一些可能的修改：

detail::iterator_value

operator->()

auto

方法1会带来很多麻烦。我认为方法 2、3 是很好的解决方案。

为什么复制在这里就像一个参考。

Node

auto& list = iter->second

这可以通过一些努力来完成。会是这样的

auto& list = iter->second.data_as_ref<std::string>();

但还是不太方便使用。

在目前的设计中，可以通过

auto list = iter->second.as<std::string>();

你不能绑定到它。它只允许你复制，不能写。

Node

如果新数据是以下类型之一，它将对其进行编码。 std::pair, std::array, std::list, std::vector, std::map, bool, Binary 它分配数据。它分配类型，枚举类中的一个成员NodeType。它分配状态，一个布尔值isDefined。

读取时，如果数据被编码，也需要解码。所以它不应该给你直接的写/读访问权限。

此外，您的 ref 可能会悬空，因为可能会重新分配内存。

在当前设计中必须像参考一样复制作品。

结论

使用auto iter = iter->first; 或使用(*iter).first。

【讨论】：

以上是关于C++ 引用不一致的主要内容，如果未能解决你的问题，请参考以下文章