已解析的 Groovy 脚本的（反）序列化

Posted 2023-03-15

技术标签:

【中文标题】已解析的 Groovy 脚本的（反）序列化【英文标题】：(de)serialization of parsed Groovy scripts 【发布时间】：2021-03-19 17:49:11 【问题描述】：

我们希望让我们的客户能够自定义其请求处理的某些方面，让他们编写一些东西（目前正在查看 Groovy 脚本），然后将这些脚本保存在数据库中并在必要时应用，这样我们就不必维护可能仅适用于某些客户的处理细节的所有微小方面。

因此，使用 Groovy，一个简单的实现应该是这样的：

GroovyShell shell = new GroovyShell(); // prepare execution engine - probably once per thread

Script script = shell.parse(scriptBody); // parse/compile execution unit

Binding binding = prepareBinding(..); script.setBinding(binding); // provide script instance with execution context

script.run(); doSomething(binding);

当一个接一个地运行时，第 1 步大约需要 10 分钟。 800 毫秒，步骤 3 大约需要 2000 毫秒，步骤 5 大约需要 150 毫秒。绝对数字会有所不同，但相对数字相当稳定。假设第 1 步不会按请求执行，并且第 5 步的执行时间是可以容忍的，我非常关心第 3 步：从源代码中解析 Groovy 脚本实例。我对文档和代码进行了一些阅读，也进行了一些谷歌搜索，但到目前为止还没有发现任何解决方案，所以问题如下：

我们能否以某种方式预编译 groovy 代码一次，然后将其保存在数据库中，然后在必要时重新水合，以获得可执行的 Script 实例（我们也可以在必要时缓存）？

或者（就像我现在想的那样）我们可以将 Java 代码编译为字节码并将其保存在 Db 中？无论如何，我不太关心脚本使用的特定语言，但亚秒级的执行时间是必须的。感谢任何提示！

注意：我知道GroovyShellEngine 可能会缓存已编译的脚本；这仍然存在第一次执行延迟太长的风险，也存在内存过度消耗的风险......

UPD1：根据@daggett 的出色建议，我修改了一个解决方案，如下所示：

GroovyShell shell = new GroovyShell();
final Class<? extends MetaClass> theClass = shell.parse(scriptBody).getMetaClass().getTheClass();

Script script = InvokerHelper.createScript(theClass, binding);
script.run();

这一切都很好！现在，我们需要解耦元类的创建和使用；为此，我创建了一个辅助方法：

    private Class dehydrateClass(Class theClass) throws IOException, ClassNotFoundException 
        final ByteArrayOutputStream stream = new ByteArrayOutputStream();
        ObjectOutputStream outputStream = new ObjectOutputStream(stream);
        outputStream.writeObject(theClass);
        InputStream in = new ByteArrayInputStream(stream.toByteArray());
        final ObjectInputStream inputStream = new ObjectInputStream(in);
        return (Class) inputStream.readObject();

我的目标如下：

    @Test
    void testDehydratedClass() throws IOException, ClassNotFoundException, IllegalAccessException, InstantiationException 
        RandomClass instance = (RandomClass) dehydrateClass(RandomClass.class).newInstance();
        assertThat(instance.getName()).isEqualTo("Test");
    

    public static class RandomClass 
        private final String name;

        public RandomClass() 
            this("Test");
        

        public RandomClass(String name) 
            this.name = name;
        

        public String getName() 
            return this.name;

通过OK，这意味着，一般来说，这种方法是OK的。

但是，当我尝试将此dehydrateClass 方法应用于theClass，由compile 阶段返回时，我得到了这个异常：

java.lang.ClassNotFoundException: Script1

    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:686)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1866)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
    at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1714)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1554)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)

所以，我的印象是，如果有问题的 ClassLoader 还不知道什么构成 Script1.. 似乎是制作这种反序列化技巧的唯一方法方法的工作是将那些预编译的类以某种方式保存在某个地方..或者可能学会以不同的方式序列化它们

【问题讨论】：

Groovy shell 引擎没有缓存任何东西。它里面的类加载器 - 保存已编译的类。您可以在数据库或文件系统中编译和存储每个脚本的编译版本。更改时编译是验证脚本本身的好方法。这是一个非常接近你恕我直言的问题：***.com/questions/58373661/reload-class-in-groovy/… 这确实带来了一些好处。所以现在的问题是如何以稍后可能被 loadClass 拾取的方式“持久化”已解析的脚本。可能“Script#metaclass”是我需要的 @daggett 我已经扩展了这个问题；你能再看看吗？我已经为你准备好了答案。你扩展了你的问题）））太酷了！我会在早上第一件事调查它！ 【参考方案1】：

您可以在编辑过程中解析/编译脚本/类，并将编译后的版本存储在某处 - 数据库、文件系统、内存……

这是一个 groovy 代码 sn-p，用于将脚本/类编译为字节码，然后从字节码定义/加载类。

import org.codehaus.groovy.control.BytecodeProcessor
import org.codehaus.groovy.control.CompilerConfiguration

//bytecode processor that could be used to store bytecode to cache(file,db,...)
@groovy.transform.CompileStatic
class BCP implements BytecodeProcessor
    Map<String,byte[]> bytecodeMap = [:]
    byte[] processBytecode(String name, byte[] original)
        println "$name >> $original.length"
        bytecodeMap[name]=original //here we could store bytecode to a database or file system instead of memory map...
        return original
    


def bcp = new BCP()
//------ COMPILE PHASE
def cc1 = new CompilerConfiguration()
cc1.setBytecodePostprocessor(bcp)
def gs1 = new GroovyShell(new GroovyClassLoader(), cc1)
//the next line will define 2 classes: MyConst and MyAdd (extends Script) named after the filename
gs1.parse("class MyConststatic int cnt=0 \n x+y+(++MyConst.cnt)", "MyAdd.groovy")

//------ RUN PHASE
//   let's create another classloader that has no information about classes MyAdd and MyConst 
def cl2 = new GroovyClassLoader()

//this try-catch just to test that MyAdd fails to load at this point 
// because unknown for 2-nd class loader
try 
    cl2.loadClass("MyAdd")
    assert 1==0: "this should not happen because previous line should throw exception"
catch(ClassNotFoundException e)

//now define previously compiled classes from the bytecode
//you can load bytecode from filesystem or from database
//for test purpose let's take them from map
bcp.bytecodeMap.eachString name, byte[] bytes->
    cl2.defineClass(name, bytes)


def myAdd = cl2.loadClass("MyAdd").newInstance()
assert myAdd instanceof groovy.lang.Script //it's a script

myAdd.setBinding([x: 1000, y: 2000] as Binding)
assert myAdd.run() == 3001 // +1 because we have x+y+(++MyConst.cnt)

myAdd.setBinding([x: 1100, y: 2200] as Binding)
assert myAdd.run() == 3302 

println "OK"

【讨论】：

这正是我想要的，非常感谢！谢谢你。我正在编译数千个这样的脚本，所以这非常有帮助。

以上是关于已解析的 Groovy 脚本的（反）序列化的主要内容，如果未能解决你的问题，请参考以下文章

XSTREAM反序列化漏洞利用之JENKINS(CVE-2016-0792)

PHP - *fast* 序列化/反序列化？

Protobuf 反序列化异常

xmldecoder反序列化漏洞分析

Python 解析模块异常模块响应模块序列化和反序列化组件

fastjson进行json的解析和序列化