是否可以生成一个伪类型,以便我可以伪造 gdb 漂亮的打印系统?

Posted

技术标签:

【中文标题】是否可以生成一个伪类型,以便我可以伪造 gdb 漂亮的打印系统?【英文标题】:Is it possible to generate a pseudo-type so that I can fake out gdb's pretty printing system? 【发布时间】:2022-01-05 00:14:27 【问题描述】:

我正在用 python 为 gdb 编写一个漂亮的打印机,并且正在慢慢掌握这种方法。试图找到有关该系统如何工作的实际文档以及这些方法的预期结果示例,就像拔牙一样。我在这里和那里找到了一些零碎的东西,但没有什么是包罗万象的。我发现的一些信息是通过反复试验得到的,进展缓慢。

到目前为止,看起来漂亮的打印机的to_string() 只允许返回一个字符串(当然),但children() 可以返回一个string 或一对stringvalue,其中value 是一个 Python 值或描述为 here 的值对象,它是正在打印的 ac/c++ 对象的包装器。实际上,我曾希望我可以返回一个漂亮的打印机对象并调用它,但可惜,事实并非如此。我可以返回一个字符串,但我希望有效负载元素在 VSCode 之类的 IDE 中是可折叠的,为此我需要返回一个值对象。与此等效的是Natvis 中的Synthetic Item。

我有一个作为缓冲区的 c++ 类。原始的,它包含一个字节向量,我需要以可读的方式对其进行处理。

给出我收集到的约束条件,如果我可以使用伪类型将指针包装在代理值对象中,我可能能够将字节分解为可用的单元。这是我所说的硬编码示例:

#include <cstdint>
struct alignas(std::uint16_t) buffer 
  enum id : char  id1, id2 ;
  // structure is: payload_size, id, payload[]
  char buf[11] =  2, id1, 1, 0, 2, 3
                 , 0, id1
                 , 1, id2, 1
                 ;
  char* end = std::end(buf);
;

int main() 
  buffer b;
  return 0;

在大端机器上的return 0; 上设置断点,我希望显示如下内容:

(gdb) p b
$1 = buffer @ 0xaddre55 =  id1[2] = 1, 2, 3, id1[0] = , id2 = 1 

这是迄今为止我得到的漂亮打印机 python 代码:

class bufferPacketPrinter:
  def __init__(self, p_begin, p_end) -> None:
    self.p_begin = p_begin  # begining of packet
    self.p_end = p_end      # end of packet
    self.cmd_id       = self.p_begin[1].cast('buffer::id')
    self.payload_size = self.p_begin[0].cast('unsigned char').cast('int')

  def to_string(self):
    return 'packet []' \
      .format(self.cmd_id, self.payload_size)

  def children(self):
    payload = self.p_begin + 2
    if self.cmd_id == 'id1':
      if self.payload_size == 0:
        return ''
      elif self.payload_size == 3:
        yield payload.cast(gdb.lookup_type('std::uint16_t').pointer())
        payload += 2
        yield payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
        payload += 1
        return payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
    elif self.cmd_id == 'id2':
      if self.payload_size == 1:
        return payload[0]
    return 'Invalid payload size of ' + str(self.payload_size)

class bufferPrinter:
  def __init__(self, val) -> None:
    self.val = val
    self.begin = self.val['buf'].cast(gdb.lookup_type('char').pointer())
    self.end = self.val['end']

  def to_string(self):
    return 'buffer @ '.format(self.val.address)
    
  def children(self):
    payload_size = self.begin[0].cast('unsigned char').cast('int')
    while self.begin != self.end:
      yield ??? # <=== Here is where the magic that I need is to happen
      self.begin += 2 + payload_size

(我还在学习python和这个API,所以如果有任何错误,请告诉我。)

倒数第二行yield ??? 是我坚持的。有任何想法吗?如果这不是这样做的方法,请告诉我另一种方法。

【问题讨论】:

你为什么不从children()返回string/string对? @ssbssa,因为我希望孩子们可以在像 VSCode 这样的 IDE 中折叠。 我曾经也需要类似的东西,所以我extended gdb 这样你就可以在children 中返回另一个漂亮的打印机,但我从未在 gdb 本身之外对其进行过测试。 @ssbssa,太好了!我想我可以尝试重建 gdb,但是在编译诸如编译器之类的东西时,我的成功非常有限。似乎总是有一些突出的错误使系统无法编译。 :( :D 我去看看。 除了伪类型之外,您还可以创建一个真正的类型。请参阅Can we define a new data type in a GDB session - Stack Overflow(但不确定它与 Visual Studio 的配合如何) 【参考方案1】:

嗯,我找到了答案,但有一些注意事项。

基本上,我正在重用类型,这意味着对象作为类型输出,但这样做时,我会更改类型的视图并告诉它再次显示。我在为 MS Visual Studio 使用 Natviz 时使用了这个技巧。不幸的是,因为 gdb 并没有真正的“视图”系统,所以我通过为所有 bufferPrinters 设置一个状态来组合一个。

class bufferPrinter:
  # Which view of the bufferPrinter to see.
  view = 0
  # Parameters passed to the child view.
  stack = []
  
  def __init__(self, val) -> None:
    self.val = val
    self.begin = self.val['buf'].cast(gdb.lookup_type('char').pointer())
    self.end = self.val['end']

  def payload_size(self, packet_it):
    return packet_it[0] \
      .cast(gdb.lookup_type('unsigned char')) \
      .cast(gdb.lookup_type('int'))

  def cmd_id(self, packet_it):
      return packet_it[1].cast(gdb.lookup_type('buffer::id'))

  def payload(self, packet_it):
      return packet_it + 2

  def to_string(self):
    if bufferPrinter.view == 0:
      return 'buffer @ '.format(self.val.address)
  
  def children(self):
    packet_it = self.begin
    if bufferPrinter.view == 0:
      packet_counter = 0
      while packet_it < self.end:
        # Setting the view should be done before viewing self.val
        bufferPrinter.view = 1
        # A stack is a bit overkill in this situration as it is not
        # a recursive structure, but is here as a generic solution
        # to pass parameters to the next level.
        bufferPrinter.stack.append(packet_it)
        yield '[]'.format(self.cmd_id(packet_it), packet_counter), self.val
        packet_counter += 1
        packet_it += 2 + self.payload_size(packet_it)
      if packet_it != self.end:
        yield 'ERROR', 'Jumped  bytes past end.'.format(packet_it - self.end)
    else:
      # Setting the view immediately and poping the stack to ensure
      # that they're not forgotten before leaving.
      bufferPrinter.view = 0
      packet_it    = bufferPrinter.stack.pop()
      payload_size = self.payload_size(packet_it)
      payload      = self.payload(packet_it)
      cmd_id       = self.cmd_id(packet_it)
      if str(cmd_id) == 'buffer::id1':
        if payload_size == 0:
          yield '[0]', ''
        elif payload_size == 4:
          yield '[0]', payload.cast(gdb.lookup_type('uint16_t').pointer())[0]
          payload += 2
          yield '[1]', payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
          payload += 1
          yield '[2]', payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
        else:
          yield 'ERROR', 'Invalid payload size of  for .'.format(payload_size, cmd_id)
      elif str(cmd_id) == 'buffer::id2':
        if payload_size == 1:
          yield '[0]', payload[0].cast(gdb.lookup_type('unsigned char')).cast(gdb.lookup_type('int'))
        else:
          yield 'ERROR', 'Invalid payload size of  for .'.format(payload_size, cmd_id)
      else:
        yield 'ERROR', 'id  invalid.'.format(cmd_id)
    return 

def my_pp_fn(val):
  if str(val.type) == 'buffer': return bufferPrinter(val)

gdb.pretty_printers.append(my_pp_fn)

其中,对于以下代码:

#include <cstdint>
#include <vector>
struct alignas(std::uint16_t) buffer 
  enum id_e : char  id1 = 6, id2 ;
  struct packet_header_t  unsigned char payload_size; id_e id; ;
  // structure is: payload_size, id, payload[]
  char buf[13] =  4, id1, 1, 0, 2, 3
                 , 0, id1
                 , 0, id1
                 , 1, id2, 1
                 ;
  char* end = std::end(buf);
;

int main() 
  buffer b;
  // Have to use types buffer::packet_header_t and buffer:id or they aren't
  // saved in the symbol table.
  buffer::packet_header_t y = ;
  buffer::id_e x = buffer::id1;
  return 0; // <== set breakpoint here

给我:

(gdb) p b
$1 = buffer @ 0xc5f5bffad0 = buffer::id1[0] = [0] = 1, [1] = 2, [2] = 3, buffer::id1[1] = [0] = , buffer::id1[2] = [0] = , buffer::id2[3] = [0] = 1

VS Code 将在监视窗口中显示此内容:

虽然这可以在命令行上使用,但在 VSCode IDE 中使用时却无法正常工作。 bufferPrinter 的 0 级视图和 bufferPrinter 的 1 级视图将混淆,因为 VSCode 将直接查询子元素,即使不查询,显示的子元素也可能不是我们想要的。如果 gdb 的漂亮打印有一个视图系统,这可能是可以避免的。

虽然我已将此作为答案发布,但我仍然坚持寻找一种生成伪类型的方法,以便不会出现这种副作用。

【讨论】:

【参考方案2】:

嗯,我已经能够做我想做的事了,只是它不是使用伪类型,而是使用我必须自己放入的实际类型。所以给出以下c++代码:

#include <cstdint>
#include <vector>
struct alignas(std::uint16_t) buffer 
  enum id_e : char  id1 = 6, id2 ;
  struct packet_header_t  unsigned char payload_size; id_e id; ;
  // structure is: payload_size, id, payload[]
  char buf[13] =  4, id1, 1, 0, 2, 3
                 , 0, id1
                 , 0, id1
                 , 1, id2, 1
                 ;
  char* end = std::end(buf);
;

int main() 
  buffer b;
  // Have to use types buffer::packet_header_t and buffer::id_e or they aren't
  // saved in the symbol table.
  buffer::packet_header_t y = ;
  buffer::id_e x = buffer::id1;
  return 0;

还有这个漂亮的打印机:

class bufferPrinterSuper:
  # Shared code between pretty-printers
  meaning = 
    #              +-- packet info
    #              |     +-- payload info
    #              |     | +-- element info
    #              v     v v
    'buffer::id1':  0 : [                            ]                      
                   , 4 : [ [2, 'uint16_t*'            ]       
                         , [1, 'unsigned char*', 'int']  
                         , [1, 'unsigned char*', 'int']
                         ] 
                                                     
  , 'buffer::id2':  1 : [ [1, 'unsigned char*', 'int']
                         ]
                   
  

  def payload_size(self, packet_it):
    return int(packet_it[0] \
      .cast(gdb.lookup_type('unsigned char')) \
      .cast(gdb.lookup_type('int')))

  def cmd_id(self, packet_it):
    return packet_it[1].cast(gdb.lookup_type('buffer::id_e'))

  def payload(self, packet_it):
    return packet_it + 2

  def get_value(self, it, element_info):
    for i in range(1, len(element_info)):
      if element_info[i][-1] == '*':
        pointer = gdb.lookup_type(element_info[i][0:-1]).pointer()
        it = it.cast(pointer).dereference()
      else:
        assert it.type.strip_typedefs() != gdb.TYPE_CODE_PTR
        value = gdb.lookup_type(element_info[i])
        it = it.cast(value)
    return it

class bufferHeaderPrinter(bufferPrinterSuper):
  def __init__(self, val):
    self.val = val
    self.begin = self.val['payload_size'].address
    self.end = self.begin + self.payload_size(self.val['payload_size'].address)

  def to_string(self):
    return 'packet @ '.format(self.val.address)

  def children(self):
    packet_it = self.begin
    cmd_id = self.cmd_id(packet_it)
    if str(cmd_id) in self.meaning:
      payload_info = self.meaning[str(cmd_id)]
      payload_size = self.payload_size(packet_it)
      if payload_size in payload_info:
        payload_it = packet_it + 2
        payload_info = payload_info[payload_size]
        payload_counter = 0
        for element_info in payload_info:
          yield '[]' \
            .format(payload_counter), self.get_value(payload_it, element_info)
          payload_it += element_info[0]
          payload_counter += 1

        # Error handling
        if payload_it > packet_it + 2 + payload_size:
          yield 'ERROR: payload_info  exceeds payload size ' \
            .format(payload_info, payload_size), 0
        elif packet_it + 2 + payload_size > payload_it:
          bytes_unaccounted_for = (packet_it - payload_it + 2 + payload_size) 
          # A warning because they could be padding
          yield "WARNING: payload_info doesn't account for  bytes: " \
            .format(bytes_unaccounted_for
                , '['
                + ', '.join(':02x'.format(int(payload_it[i]))
                            for i in range(0, bytes_unaccounted_for))
                + ']'), 0
      else:
        yield 'ERROR: Size  for id  not recognized.'.format(payload_size, cmd_id), 0
    else:
      yield 'ERROR: Command  not recognized.'.format(cmd_id), 0


class bufferPrinter(bufferPrinterSuper):
  def __init__(self, val) -> None:
    self.val = val
    self.begin = self.val['buf'].cast(gdb.lookup_type('char').pointer())
    self.end = self.val['end']

  def to_string(self):
    return 'buffer @ '.format(self.val.address)

  def children(self):
    packet_it = self.begin
    packet_counter = 0
    while packet_it < self.end:
      cmd_id = self.cmd_id(packet_it)
      yield '[] ()' \
        .format(packet_counter, self.cmd_id(packet_it), self.payload_size(packet_it)) \
        , packet_it.cast(gdb.lookup_type('buffer::packet_header_t').pointer()).dereference()
      packet_counter += 1
      packet_it += 2 + self.payload_size(packet_it)

    if packet_it != self.end:
      yield 'ERROR', 'Jumped  bytes past end.'.format(packet_it - self.end)
    return 

def my_pp_fn(val):
  if str(val.type) == 'buffer': return bufferPrinter(val)
  if str(val.type) == 'buffer::packet_header_t': return bufferHeaderPrinter(val)

gdb.pretty_printers.append(my_pp_fn)

我得到以下输出:

(gdb) p b
$1 = buffer @ 0x8c47fffa20 = [0] buffer::id1(4) = packet @ 0x8c47fffa20 = [0] = 1, [1] = 2, [2] = 3, [1] buffer::id1(0) = packet @ 0x8c47fffa26, [2] buffer::id1(0) = packet @ 0x8c47fffa28, [3] buffer::id2(1) = packet @ 0x8c47fffa2a = [0] = 1

这样做的一些问题是我必须确保使用此标头类型,以便它保留在符号表中。这可能有点棘手,因为优化器可能希望在认为不必要时将其删除,实际上这对于程序的操作员来说不是必需的,这仅用于调试。

除非有人能告诉我如何生成伪类型,或以其他方式生成可折叠子项,否则我认为我必须将其标记为答案,而我宁愿不这样做。 叹息

【讨论】:

以上是关于是否可以生成一个伪类型,以便我可以伪造 gdb 漂亮的打印系统?的主要内容,如果未能解决你的问题,请参考以下文章

是否可以在 iFrame 中伪造国家/地区?

是否可以调试由没有 gdb 标志编译的可执行文件生成的核心文件?

10.4通过生成器yield实现伪并发

通过 GDB 调试 DMD 生成程序

如何伪造/单元测试 Azure 存储队列?

编译库以便 GDB 自动查找源