越界访问字符串不会触发任何 valgrind/ASAN/UBSAN 警告

Posted

技术标签:

【中文标题】越界访问字符串不会触发任何 valgrind/ASAN/UBSAN 警告【英文标题】:Accessing a string out of bounds does not trigger any valgrind/ASAN/UBSAN warnings 【发布时间】:2022-01-15 22:50:54 【问题描述】:

我有这个代码:

static int main(string[] args) 
        info(escape_latex(args[1]));
        return 0;

string escape_latex(string input) 
        var builder = new StringBuilder.sized(input.length + 20);
        var map = new Gee.HashMap<string, string>();
        // ...<Snip>...
        // Fix for some weird unicode bugs
        map["\xff\xbf\xbf\xbf\xbf\xbf"] = "";
        info("Len: %d", input.char_count());
        for(var i = 0; i < input.char_count(); i++) 
                var ic = input.get_char(i);
                var as_string = ic.to_string();
                info("%d %s", i, as_string);
                if(map.has_key(as_string)) 
                        builder.append(map[as_string]);
                 else 
                        builder.append_unichar(ic);
                
        
        return builder.str;

如果我通过“foo123”,我会得到预期的输出“foo123”。但是如果我通过“Geldbeutel+Schlüsselanhänger”,我会得到输出“Geldbeutel+Schl?sselanh?ng”(最后两个字符丢失)。

现在我将 for 循环更改为 for(var i = 0; i &lt;= input.char_count(); i++)

对于“foo123”,我得到预期的输出,对于“Geldbeutel+Schlüsselanhänger”,我得到“Geldbeutel+Schl?sselanh?nge”。 (Valgrind、ASAN 和 UBSAN 不显示任何内容)。

现在我将 for 循环更改为 for(var i = 0; i &lt;= input.char_count() + 1; i++)

“foo123”现在是foo123G,当我跑到其他内存时,但“Geldbeutel+Schlüsselanhänger”给出正确的输出“Geldbeutel+Schl?sselAnh?nger”

对于最后一个示例输入,一个示例输出:

** INFO: 19:41:57.903: a.vala:23: Len: 28
** INFO: 19:41:57.903: a.vala:29: 0 G
** INFO: 19:41:57.903: a.vala:29: 1 e
** INFO: 19:41:57.903: a.vala:29: 2 l
** INFO: 19:41:57.903: a.vala:29: 3 d
** INFO: 19:41:57.903: a.vala:29: 4 b
** INFO: 19:41:57.903: a.vala:29: 5 e
** INFO: 19:41:57.903: a.vala:29: 6 u
** INFO: 19:41:57.903: a.vala:29: 7 t
** INFO: 19:41:57.903: a.vala:29: 8 e
** INFO: 19:41:57.903: a.vala:29: 9 l
** INFO: 19:41:57.903: a.vala:29: 10 +
** INFO: 19:41:57.903: a.vala:29: 11 S
** INFO: 19:41:57.903: a.vala:29: 12 c
** INFO: 19:41:57.903: a.vala:29: 13 h
** INFO: 19:41:57.903: a.vala:29: 14 l
** INFO: 19:41:57.903: a.vala:29: 15 ?
** INFO: 19:41:57.903: a.vala:29: 17 s
** INFO: 19:41:57.903: a.vala:29: 18 s
** INFO: 19:41:57.903: a.vala:29: 19 e
** INFO: 19:41:57.903: a.vala:29: 20 l
** INFO: 19:41:57.903: a.vala:29: 21 a
** INFO: 19:41:57.903: a.vala:29: 22 n
** INFO: 19:41:57.903: a.vala:29: 23 h
** INFO: 19:41:57.903: a.vala:29: 24 ?
** INFO: 19:41:57.903: a.vala:29: 26 n
** INFO: 19:41:57.903: a.vala:29: 27 g
** INFO: 19:41:57.903: a.vala:29: 28 e
** INFO: 19:41:57.903: a.vala:29: 29 r            // <- Here, I access an invalid index, but it works
** INFO: 19:41:57.903: a.vala:2: Geldbeutel+Schl?sselanh?nger

好像和unicode有关,但是我找不到办法让这个功能起作用。

【问题讨论】:

您是否遗漏了任何重要信息? 【参考方案1】:

这与语言环境有关,C 运行时环境的默认设置是 US ASCII。您可以通过将空字符串传递给Intl.setlocale()LocaleCategory.ALL 将其设置为运行时环境的用户首选语言环境,这也是默认参数值,因此Intl.setlocale(); 将起作用:

static int main(string[] args) 
        Intl.setlocale();
        print(escape_latex(args[1]) + "\n");
        return 0;

string escape_latex(string input) 
        var builder = new StringBuilder.sized(input.length + 20);
        var map = new Gee.HashMap<string, string>();
        // ...<Snip>...
        // Fix for some weird unicode bugs
        map["\xff\xbf\xbf\xbf\xbf\xbf"] = "";
        info("Len: %d", input.char_count());
        for(var i = 0; i < input.char_count(); i++) 
                var ic = input.get_char(i);
                var as_string = ic.to_string();
                info("%d %s", i, as_string);
                if(map.has_key(as_string)) 
                        builder.append(map[as_string]);
                 else 
                        builder.append_unichar(ic);
                
        
        return builder.str;

【讨论】:

以上是关于越界访问字符串不会触发任何 valgrind/ASAN/UBSAN 警告的主要内容,如果未能解决你的问题,请参考以下文章

内存越界一定会导致程序崩溃吗?详解内存越界

算法与数据结构应用

为啥python的列表切片不会产生索引越界错误? [复制]

内存地址越界访问异常

对缓冲区大小施加限制

在linux系统下使用内存技术,检测堆越界错误