实时调试堆栈溢出

Posted 2023-04-15

技术标签:

【中文标题】实时调试堆栈溢出【英文标题】：live debugging a stack overflow 【发布时间】：2010-10-20 15:44:25 【问题描述】：

我有一个托管代码 Windows 服务应用程序，由于托管 ***Exception，它在生产中偶尔会崩溃。我知道这一点是因为我在崩溃模式下运行了 adplus 并使用 SoS 分析了崩溃转储事后分析。我什至附加了 windbg 调试器并将其设置为“未处理异常”。

我的问题是，我看不到任何托管堆栈或切换到任何线程。调试器崩溃时，它们都被拆除了。

我不是 Windbg 专家，并且没有在实时系统上安装 Visual Studio 或使用该工具进行远程调试和调试，有没有人对我如何从违规中获取堆栈跟踪有任何建议线程？

这就是我正在做的事情。

!线程

...

XXXX 11 27c 000000001b2175f0 b220 已禁用 00000000072c9058:00000000072cad80 0000000019bdd3f0 0 Ukn System.***Exception (0000000000c010d0)

...

此时你会看到 XXXX ID 表明线程已经死了。

【问题讨论】：

【参考方案1】：

一旦遇到堆栈溢出，调试问题就很不走运了——耗尽堆栈空间会使程序处于不确定状态，因此您不能依赖任何中的信息 - 您尝试获取的任何堆栈跟踪都可能已损坏，并且很容易将您指向错误的方向。即，一旦发生 ***Exception，为时已晚。

此外，根据the documentation，您无法从 .Net 2.0 开始捕获 ***Exception，因此其他用 try/catch 包围您的代码的建议可能不起作用。考虑到堆栈溢出的副作用（我很惊讶 .Net 竟然允许你捕捉到它），这非常有意义。

您唯一真正的选择是进行乏味的代码分析，寻找可能导致堆栈溢出的任何内容，并放入某种标记，以便您了解它们发生的位置之前他们发生了。例如，显然任何递归方法都是第一个开始的地方，所以给它们一个深度计数器并抛出你自己的异常，如果它们达到你定义的一些“不合理”的值，那么你实际上可以得到有效的堆栈跟踪。

【讨论】：

这很有趣。我什至没有注意到它发生了变化。我最后一次遇到其中一个是当我输入错误的属性/成员 getter 并得到无休止的递归调用时（那时我能够捕获并调试它）。 +1 用于实际阅读最新文档。【参考方案2】：

是否可以选择使用写入EventLog（或文件或其他任何内容）的try-catch 包装您的代码并一次性运行此调试？

try  ...  catch(SOE)  EventLog.Write(...); throw;

您将无法调试，但您会获得堆栈跟踪。

【讨论】：

【参考方案3】：

您有一个选择是在高级别使用 try/catch 块，然后打印或记录异常提供的堆栈跟踪。每个异常都有一个StackTrace 属性，可以告诉你它是从哪里抛出的。这不会让您进行任何交互式调试，但它应该为您提供一个开始的地方。

【讨论】：

我只是有一种似曾相识的奇怪感觉...... :) 嘿，我刚刚重新阅读了您的答案，我明白了您的观点：P。哦，好吧，可能值得明确指出异常具有它们被抛出的堆栈，以防它不明显【参考方案4】：

不管怎样，从 .NET 4.0 开始，Visual Studio（以及任何依赖 ICorDebug api 的调试器）获得了调试小型转储的能力。这意味着您将能够将故障转储加载到另一台计算机上的 VS 调试器中，并查看类似于在崩溃时附加调试器时的托管堆栈。有关详细信息，请参阅 PDC talk 或 Rick Byers' blog。不幸的是，这不会帮助您解决手头的问题，但也许下次您遇到此问题时会有所帮助。

【讨论】：

【参考方案5】：

查看您的 ADPLUS 崩溃模式调试日志。查看在托管 ***Exception 抛出之前是否发生任何访问冲突或真正的原生 Stack Overflow 异常。

我的猜测是在线程退出之前你冷捕获的线程堆栈上有一个异常。

您还可以使用 www.iis.net 中的 DebugDiag，然后设置崩溃规则并为访问冲突 (sxe av) 和堆栈溢出本机异常 (sxe sov) 创建完整转储文件

谢谢，亚伦

【讨论】：

【参考方案6】：

如果发现自己过于频繁地检查目标对象，它会抱怨。这不是万能的。例如，循环可能会导致误报。可以通过在有风险的代码之后再次调用来避免这种情况，告诉检查器它可以减少对目标对象的递归调用。它仍然不是防弹的。

要使用它，我只需调用它

public void DangerousMethod() 
  RecursionChecker.Check(someTargetObjectThatWillBeTheSameIfWeReturnHereViaRecursion);
  // recursion-risky code here.

这里是 RecursionChecker 类：

/// <summary>If you use this class frequently from multiple threads, expect a lot of blocking. In that case,
/// might want to make this a non-static class and have an instance per thread.</summary>
public static class RecursionChecker

  #if DEBUG
  private static HashSet<ReentrancyInfo> ReentrancyNotes = new HashSet<ReentrancyInfo>();
  private static object LockObject  get; set;  = new object();
  private static void CleanUp(HashSet<ReentrancyInfo> notes) 
    List<ReentrancyInfo> deadOrStale = notes.Where(info => info.IsDeadOrStale()).ToList();
    foreach (ReentrancyInfo killMe in deadOrStale) 
      notes.Remove(killMe);
    
  
  #endif
  public static void Check(object target, int maxOK = 10, int staleMilliseconds = 1000)
  
    #if DEBUG
    lock (LockObject) 
      HashSet<ReentrancyInfo> notes = RecursionChecker.ReentrancyNotes;
      foreach (ReentrancyInfo note in notes) 
        if (note.HandlePotentiallyRentrantCall(target, maxOK)) 
          break;
        
      
      ReentrancyInfo newNote = new ReentrancyInfo(target, staleMilliseconds);
      newNote.HandlePotentiallyRentrantCall(target, maxOK);
      RecursionChecker.CleanUp(notes);
      notes.Add(newNote);
    
    #endif

帮助类如下：

internal class ReentrancyInfo

  public WeakReference<object> ReentrantObject  get; set;
  public object GetReentrantObject() 
    return this.ReentrantObject?.TryGetTarget();
  
  public DateTime LastCall  get; set;
  public int StaleMilliseconds  get; set;
  public int ReentrancyCount  get; set;
  public bool IsDeadOrStale() 
    bool r = false;
    if (this.LastCall.MillisecondsBeforeNow() > this.StaleMilliseconds) 
      r = true;
     else if (this.GetReentrantObject() == null) 
      r = true;
    
    return r;
  
  public ReentrancyInfo(object reentrantObject, int staleMilliseconds = 1000)
  
    this.ReentrantObject = new WeakReference<object>(reentrantObject);
    this.StaleMilliseconds = staleMilliseconds;
    this.LastCall = DateTime.Now;
  
  public bool HandlePotentiallyRentrantCall(object target, int maxOK) 
    bool r = false;
    object myTarget = this.GetReentrantObject();
    if (target.DoesEqual(myTarget)) 
      DateTime last = this.LastCall;
      int ms = last.MillisecondsBeforeNow();
      if (ms > this.StaleMilliseconds) 
        this.ReentrancyCount = 1;
      
      else 
        if (this.ReentrancyCount == maxOK) 
          throw new Exception("Probable infinite recursion");
        
        this.ReentrancyCount++;
      
    
    this.LastCall = DateTime.Now;
    return r;
  


public static class DateTimeAdditions

  public static int MillisecondsBeforeNow(this DateTime time) 
    DateTime now = DateTime.Now;
    TimeSpan elapsed = now.Subtract(time);
    int r;
    double totalMS = elapsed.TotalMilliseconds;
    if (totalMS > int.MaxValue) 
      r = int.MaxValue;
     else 
      r = (int)totalMS;
    
    return r;
  


public static class WeakReferenceAdditions 
  /// <summary> returns null if target is not available. </summary>
  public static TTarget TryGetTarget<TTarget> (this WeakReference<TTarget> reference) where TTarget: class 
  
    TTarget r = null;
    if (reference != null) 
      reference.TryGetTarget(out r);
    
    return r;

【讨论】：

以上是关于实时调试堆栈溢出的主要内容，如果未能解决你的问题，请参考以下文章