简单了解taint 污点追踪

Posted 2022-11-23 Recar

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了简单了解taint 污点追踪相关的知识，希望对你有一定的参考价值。

php扩展流程

先模块初始化阶段（MINIT）注册常量或者类等初始化操作
然后模块激活阶段（RINIT）该过程发生在请求阶段，例如通过url请求某个页面，则在每次请求之前都会进行模块激活（RINIT请求开始）。例如PHP注册了一些扩展模块

taint

taint主要由三部分构成，污点标记、污点传播、污点沉降

污点方法

//定义mark规则
#define TAINT_MARK(str)        (GC_FLAGS((str)) |= IS_STR_TAINT_POSSIBLE)
#define TAINT_POSSIBLE(str) (GC_FLAGS((str)) & IS_STR_TAINT_POSSIBLE)
#define TAINT_CLEAN(str)      (GC_FLAGS((str)) &= ~IS_STR_TAINT_POSSIBLE)

我们根据污点函数可以看到不同版本添加污点的方式不同
是在 php_taint.h 里面了

# if PHP_VERSION_ID >=70300
#  define EX_CONSTANT(op) RT_CONSTANT(EX(opline), op)
#  undef IS_STR_TAINT_POSSIBLE
#  define IS_STR_TAINT_POSSIBLE (1<<5) /* GC_PROTECTED */
#  define TAINT_MARK(str)     GC_ADD_FLAGS(str, IS_STR_TAINT_POSSIBLE)
#  define TAINT_POSSIBLE(str) (GC_FLAGS((str)) & IS_STR_TAINT_POSSIBLE)
#  define TAINT_CLEAN(str)    GC_DEL_FLAGS(str, IS_STR_TAINT_POSSIBLE)
# else
#  define TAINT_MARK(str)     (GC_FLAGS((str)) |= IS_STR_TAINT_POSSIBLE)
#  define TAINT_POSSIBLE(str) (GC_FLAGS((str)) & IS_STR_TAINT_POSSIBLE)
#  define TAINT_CLEAN(str)    (GC_FLAGS((str)) &= ~IS_STR_TAINT_POSSIBLE)
# endif

然后在c文件中可以看到对这三个函数的调用

PHP_RINIT_FUNCTION

对输入源进行污点标记
模块被调用的函数 PHP_RINIT_FUNCTION
可以看到数据输入层面只有这三点，对于webshell的检查肯定是不够的比如从请求头中从环境变量中等等

    // 污染post数据
	if (Z_TYPE(PG(http_globals)[TRACK_VARS_POST]) == IS_ARRAY) 
		php_taint_mark_strings(Z_ARRVAL(PG(http_globals)[TRACK_VARS_POST]));
	
    // 污染get数据
	if (Z_TYPE(PG(http_globals)[TRACK_VARS_GET]) == IS_ARRAY) 
		php_taint_mark_strings(Z_ARRVAL(PG(http_globals)[TRACK_VARS_GET]));
	
    // 污染cookie数据
	if (Z_TYPE(PG(http_globals)[TRACK_VARS_COOKIE]) == IS_ARRAY) 
		php_taint_mark_strings(Z_ARRVAL(PG(http_globals)[TRACK_VARS_COOKIE]));

使用这个函数污染 php_taint_mark_strings

例如这种是全局变量 TRACK_VARS_COOKIE 还有啥呢

TRACK_VARS_POST
TRACK_VARS_GET
TRACK_VARS_FILES
TRACK_VARS_COOKIE
TRACK_VARS_ENV
TRACK_VARS_SERVER
TRACK_VARS_REQUEST

这些是数据输入点外部传入的话都会打污点标记

标记所有的GET、COOKIE、POST、SERVER这些array中的每个key->value初始标记为污点

static void php_taint_mark_strings(zend_array *symbol_table) /*  */ 
	zval *val;
	ZEND_HASH_FOREACH_VAL(symbol_table, val) 
		ZVAL_DEREF(val);
		if (Z_TYPE_P(val) == IS_ARRAY) 
			php_taint_mark_strings(Z_ARRVAL_P(val));
		 else if (IS_STRING == Z_TYPE_P(val) && Z_STRLEN_P(val)) 
			TAINT_MARK(Z_STR_P(val));
		
	 ZEND_HASH_FOREACH_END();
 /*  */

PHP_MINIT_FUNCTION 数据传递

	php_taint_register_handlers(); // 进行关键执行函数hook  通过opcode
	php_taint_override_functions(); // 对数据传递的函数进行hook 通过劫持handler

在数据传递的过程中对传递的数据也认为是污染的
有哪些函数是危险函数并且认为会污染数据的
php_taint_override_functions

static void php_taint_override_functions() /*  */ 
	const char *f_join         = "join";
	const char *f_trim         = "trim";
	const char *f_split        = "split";
	const char *f_rtrim        = "rtrim";
	const char *f_ltrim        = "ltrim";
	const char *f_strval       = "strval";
	const char *f_strstr       = "strstr";
	const char *f_substr       = "substr";
	const char *f_sprintf      = "sprintf";
	const char *f_explode      = "explode";
	const char *f_implode      = "implode";
	const char *f_str_pad      = "str_pad";
	const char *f_vsprintf     = "vsprintf";
	const char *f_str_replace  = "str_replace";
	const char *f_str_ireplace = "str_ireplace";
	const char *f_strtolower   = "strtolower";
	const char *f_strtoupper   = "strtoupper";
	const char *f_dirname      = "dirname";
	const char *f_basename     = "basename";
	const char *f_pathinfo     = "pathinfo";

	php_taint_override_func(f_strval, PHP_FN(taint_strval), &TAINT_O_FUNC(strval));
	php_taint_override_func(f_sprintf, PHP_FN(taint_sprintf), &TAINT_O_FUNC(sprintf));
	php_taint_override_func(f_vsprintf, PHP_FN(taint_vsprintf), &TAINT_O_FUNC(vsprintf));
	php_taint_override_func(f_explode, PHP_FN(taint_explode), &TAINT_O_FUNC(explode));
	php_taint_override_func(f_split, PHP_FN(taint_explode), NULL);
	php_taint_override_func(f_implode, PHP_FN(taint_implode), &TAINT_O_FUNC(implode));
	php_taint_override_func(f_join, PHP_FN(taint_implode), NULL);
	php_taint_override_func(f_trim, PHP_FN(taint_trim), &TAINT_O_FUNC(trim));
	php_taint_override_func(f_rtrim, PHP_FN(taint_rtrim), &TAINT_O_FUNC(rtrim));
	php_taint_override_func(f_ltrim, PHP_FN(taint_ltrim), &TAINT_O_FUNC(ltrim));
	php_taint_override_func(f_str_replace, PHP_FN(taint_str_replace), &TAINT_O_FUNC(str_replace));
	php_taint_override_func(f_str_ireplace, PHP_FN(taint_str_ireplace), &TAINT_O_FUNC(str_ireplace));
	php_taint_override_func(f_str_pad, PHP_FN(taint_str_pad), &TAINT_O_FUNC(str_pad));
	php_taint_override_func(f_strstr, PHP_FN(taint_strstr), &TAINT_O_FUNC(strstr));
	php_taint_override_func(f_strtolower, PHP_FN(taint_strtolower), &TAINT_O_FUNC(strtolower));
	php_taint_override_func(f_strtoupper, PHP_FN(taint_strtoupper), &TAINT_O_FUNC(strtoupper));
	php_taint_override_func(f_substr, PHP_FN(taint_substr), &TAINT_O_FUNC(substr));
	php_taint_override_func(f_dirname, PHP_FN(taint_dirname), &TAINT_O_FUNC(dirname));
	php_taint_override_func(f_basename, PHP_FN(taint_basename), &TAINT_O_FUNC(basename));
	php_taint_override_func(f_pathinfo, PHP_FN(taint_pathinfo), &TAINT_O_FUNC(pathinfo));

 /*  */

怎么hook的呢

劫持hadnler

保留原来函数的handler后替换handler为我们hook的函数

比如 taint_strval 输出字符串值

PHP_FUNCTION(taint_strval) 
	zval *num;
	int tainted = 0;

	if (zend_parse_parameters(ZEND_NUM_ARGS(), "z", &num) == FAILURE) 
		return;
	
    // 判断参数是否有污点 有污点 tainted 设置为1
	if (Z_TYPE_P(num) == IS_STRING && TAINT_POSSIBLE(Z_STR_P(num))) 
		tainted = 1;
	
    // 调用真实的函数
	TAINT_O_FUNC(strval)(INTERNAL_FUNCTION_PARAM_PASSTHRU);
    // 如果之前是污点的 有返回值且返回值不等于参数 并且 返回值不是空 就污染返回值
	if (tainted && IS_STRING == Z_TYPE_P(return_value) 
			&& Z_STR_P(return_value) != Z_STR_P(num) && Z_STRLEN_P(return_value)) 
		TAINT_MARK(Z_STR_P(return_value));

其他函数的写法类似

我们可以看到这个是不全的，例如对base64编码的就没有进行hook

PHP_MINIT_FUNCTION 数据执行

对传入的数据最后的执行函数的opcode进行hook

在函数 PHP_MINIT_FUNCTION (注册常量或者类等初始化操作) 中

	php_taint_register_handlers();
	php_taint_override_functions();

php_taint_register_handlers 中opcode进行hook

static void php_taint_register_handlers() /*  */ 
	int idx;
	for (idx = 0; idx < sizeof(override_opcode_handlers)/sizeof(taint_custom_handler); idx++) 
		origin_opcode_handler[idx] = (void*)zend_get_user_opcode_handler(override_opcode_handlers[idx].opcode);
	
	for (idx = 0; idx < sizeof(override_opcode_handlers)/sizeof(taint_custom_handler); idx++) 
		zend_set_user_opcode_handler(override_opcode_handlers[idx].opcode, (user_opcode_handler_t)override_opcode_handlers[idx].handler);
	
	return;
 /*  */

所有hook的数据执行函数

static const taint_custom_handler override_opcode_handlers[] = 
	 ZEND_ECHO, php_taint_echo_handler ,
	 ZEND_EXIT, php_taint_exit_handler ,
	 ZEND_INIT_USER_CALL, php_taint_init_dynamic_fcall_handler ,
	 ZEND_INIT_DYNAMIC_CALL, php_taint_init_dynamic_fcall_handler ,
	 ZEND_INCLUDE_OR_EVAL, php_taint_include_or_eval_handler ,
	 ZEND_CONCAT, php_taint_concat_handler ,
	 ZEND_FAST_CONCAT, php_taint_concat_handler ,
#if PHP_VERSION_ID < 70400
	 ZEND_ASSIGN_CONCAT, php_taint_assign_concat_handler ,
#else
	 ZEND_ASSIGN_OP, php_taint_assign_op_handler ,
	 ZEND_ASSIGN_DIM_OP, php_taint_assign_dim_op_handler ,
	 ZEND_ASSIGN_OBJ_OP, php_taint_assign_obj_op_handler ,
#endif
	 ZEND_ROPE_END, php_taint_rope_handler ,
	 ZEND_DO_FCALL, php_taint_fcall_handler ,
	 ZEND_DO_ICALL, php_taint_fcall_handler ,
	 ZEND_DO_FCALL_BY_NAME, php_taint_fcall_handler 
;

把所有的函数进行hook 使用 zend_set_user_opcode_handler

最后是判断逻辑
是根据opcode进行hook关键的执行函数

比如说 echo php_taint_echo_handler
如果给关键执行函数传入的参数是污点的那么就输出危险

static int php_taint_echo_handler(zend_execute_data *execute_data) /*  */ 
	const zend_op *opline = execute_data->opline;
	taint_free_op free_op1;
	zval *op1;

	op1 = php_taint_get_zval_ptr(execute_data, opline->op1_type, opline->op1, &free_op1, BP_VAR_R, 0);

	if (op1 && IS_STRING == Z_TYPE_P(op1) && TAINT_POSSIBLE(Z_STR_P(op1))) 
		if (opline->extended_value) 
			php_taint_error("print", "Attempt to print a string that might be tainted");
		 else 
			php_taint_error("echo", "Attempt to echo a string that might be tainted");
		
	

	CALL_ORIGIN_HANDLER();
	return ZEND_USER_OPCODE_DISPATCH;
 /*  */

检测webshell

我们先简单写一个一句话并且在taint中加一些debug信息

可以看到先进入minit 先使用opcode hook关键的执行函数然后hook 数据传递的函数
再之后进入rinit 对数据输入进行污点标记最后执行关键函数的数据值是污染的那么就认为有问题

如何去除污点

对于数据经过排除hook的危险函数的函数则返回值产生新的变量的则相当于去除了污点
对于特定的可控的可以我们自定义去手动去除污点 TAINT_CLEAN

一些问题

如果对webshell加上@ 至少默认情况下 taint不会输出关键信息绕过对于没有hook的传递函数和数据输入点都会产生绕过

之前一直以为是会有一个污染链路的情况，没想到只是把所有外部输入和部分危险函数的数据标记为污点后最后判断关键执行点。然后这个链路可以我们自己去加

参考

taint 污点追踪分析的一些文章
https://www.cnblogs.com/ermei/p/9778021.html
https://www.dazhuanlan.com/2020/04/01/5e84699490d0c/
https://www.laruence.com/tag/taint
https://xz.aliyun.com/t/4268
https://www.jianshu.com/p/c6dea66c54f3

扩展概念
https://www.jianshu.com/p/98eec8b08a8e

PHP 扩展添加方法
https://xz.aliyun.com/t/4214

https://xz.aliyun.com/t/7316
https://www.anquanke.com/post/id/98938

taint扩展的策略和敏感函数黑名单
https://www.cnblogs.com/linzhenjie/p/5485474.html

php语法解析
python版的php语法解析
https://github.com/g-i-o-/pyphp
https://github.com/Alexey-T/pyPhpTree
go版本的php语法解析
https://github.com/z7zmey/php-parser
https://github.com/stephens2424/php
python版通用的高级语法解析
https://github.com/autosoft-dev/tree-hugger

以上是关于简单了解taint 污点追踪的主要内容，如果未能解决你的问题，请参考以下文章