在保护评论表单和相关 API 端点时，是不是应该在浏览器、服务器或两者中对输入进行清理、验证和编码？

Posted 2023-03-08

技术标签:

【中文标题】在保护评论表单和相关 API 端点时，是不是应该在浏览器、服务器或两者中对输入进行清理、验证和编码？【英文标题】：When securing a comment form and related API endpoint, should input be sanitized, validated and encoded in browser, server or both?在保护评论表单和相关 API 端点时，是否应该在浏览器、服务器或两者中对输入进行清理、验证和编码？ 【发布时间】：2020-12-27 16:21:40 【问题描述】：

我正在尝试在没有用户身份验证的非 CMS 环境中尽可能确保评论表单的安全。

表单应该对浏览器和 curl/postman 类型的请求都是安全的。

环境

后端 - Node.js、MongoDB Atlas 和 Azure Web 应用程序。前端 - jQuery。

以下是我当前工作实施的详细概述，但希望不会过于庞大。

以下是我关于实施的问题。

使用的相关库

Helmet - 通过设置各种 HTTP 标头来帮助保护 Express 应用程序，包括 Content Security PolicyreCaptcha v3 - 防止垃圾邮件和其他类型的自动滥用DOMPurify - XSS 清理器validator.js - 字符串验证器和净化器库he - html 实体编码器/解码器

数据的一般流程是：

/*
on click event:  
- get sanitized data
- perform some validations
- html encode the values
- get recaptcha v3 token from google
- send all data, including token, to server
- send token to google to verify
- if the response 'score' is above 0.5, add the submission to the database  
- return the entry to the client and populate the DOM with the submission   
*/

POST 请求 - 浏览器

// test input:  
// <script>alert("hi!")</script><h1>hello there!</h1> <a href="">link</a>

// sanitize the input  
var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(),  SAFE_FOR_JQUERY: true );
var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(),  SAFE_FOR_JQUERY: true );

// validation - make sure input is between 1 and 140 characters
var input_1_text_valid_length = validator.isLength(sanitized_input_1_text,  min: 1, max: 140 );
var input_2_text_valid_length = validator.isLength(sanitized_input_2_text,  min: 1, max: 140 );

// if validations pass
if (input_1_text_valid_length === true && input_2_text_valid_length === true) 

/* 
encode the sanitized input 
not sure if i should encode BEFORE adding to MongoDB  
or just add to database "as is" and encode BEFORE displaying in the DOM with $("#ouput").html(html_content);
*/  
var sanitized_encoded_input_1_text = he.encode(input_1_text);
var sanitized_encoded_input_2_text = he.encode(input_2_text);

// define parameters to send to database  
var parameters = ;
parameters.input_1_text = sanitized_encoded_input_1_text; 
parameters.input_2_text = sanitized_encoded_input_2_text; 

// get token from google and send token and input to database
// see:  https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
grecaptcha.ready(function() 
    grecaptcha.execute('site-key-here',  action: 'submit' ).then(function(token) 
        parameters.token = token;
        jquery_ajax_call_to_my_api(parameters);
    );
);

POST 请求 - 服务器

var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
var token = req.body.token;
var url = `https://www.google.com/recaptcha/api/siteverify?secret=$secret_key&response=$token`;

// verify recaptcha token with google
var response = await fetch(url);
var response_json = await response.json();
var score = response_json.score;
var document = ;

/*
if google's response 'score' is greater than 0.5, 
add submission to the database and populate client DOM with $("#output").prepend(html); 
see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score
*/
if (score >= 0.5) 

    // add submission to database 
    // return submisson to client to update the DOM
    // DOM will just display this text:  <h1>hello there!</h1> <a href="">link</a>
);

页面加载时的 GET 请求

逻辑/假设：

获取所有提交，返回客户端并使用$("#output").html(html_content); 添加到 DOM。在填充 DOM 之前不需要对值进行编码，因为值已经在数据库中编码？

来自 curl、邮递员等的 POST 请求

逻辑/假设：

他们没有 google 令牌，因此无法从服务器验证它，也无法向数据库添加条目？

服务器上的头盔配置

app.use(
    helmet(
        contentSecurityPolicy: 
            directives: 
                defaultSrc: ["'self'"],
                scriptSrc: ["'self'", "https://somedomain.io", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
                styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
                fontSrc: ["'self'", "fonts.gstatic.com"],
                imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:"],
                frameSrc: ["'self'", "https://www.google.com"]
            
        ,
    )
);

问题

我应该将值作为 HTML 编码实体添加到 MongoDB 数据库，还是应该“按原样”存储它们并在用它们填充 DOM 之前对其进行编码？

如果将值保存为 MongoDB 中的 html 实体，这是否会使在数据库中搜索内容变得困难，因为搜索，例如 "<h1>hello there!</h1> <a href="">link</a> 不会返回任何结果，因为值在数据库中是&#x3C;h1&#x3E;hello there!&#x3C;/h1&#x3E; &#x3C;a href=&#x22;&#x22;&#x3E;link&#x3C;/a&#x3E;

在我阅读有关保护 Web 表单的内容时，很多人谈到客户端的做法是相当多余的，因为可以在 DOM 中更改任何内容，可以禁用 javascript，并且可以使用 curl 或 postman 直接向 API 端点发出请求因此绕过任何客户端方法。

也就是说应该执行清理 (DOMPurify)、验证 (validator.js) 和编码 (he)：1) 仅客户端 2) 客户端和服务器端或 3) 仅服务器端？

为了彻底，这里是另一个相关的问题：

在从客户端向服务器发送数据时，以下任何组件是否会进行任何自动转义或 HTML 编码？我问是因为如果他们这样做，它可能会使一些手动转义或编码变得不必要。

jQuery ajax() 请求 Node.js 快递头盔 bodyParser（节点包） MongoDB 原生驱动程序 MongoDB

【问题讨论】：

【参考方案1】：

您应该始终不确定您使用的每一个数据在使用前都在后端进行了清理！

见https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html

【讨论】：

【参考方案2】：

在阅读了有关该主题的更多内容后，这是我想出的方法：

点击事件：

清理数据 (DOMPurify) 验证数据 (validator.js) 从 google (reCaptcha v3) 获取 recaptcha v3 令牌将所有数据（包括令牌）发送到服务器服务器正在使用Helmet 服务器正在使用Express Rate Limit 和Rate Limit Mongo 将某个路由上的POST 请求限制为X 每X 毫秒（按IP 地址）服务器位于Cloudflare 代理后面，该代理提供一些安全和缓存功能（需要在节点服务器文件中设置app.set('trust proxy', true)，以便限速器获取用户的实际IP 地址 - 请参阅Express behind proxies）从服务器向 google 发送令牌以进行验证 (reCaptcha v3) 如果响应“分数”高于0.5，请再次执行相同的清理和验证如果验证通过，则使用moderated 标志值为false 的数据库添加条目

我决定不是立即将条目返回到浏览器，而是需要一个手动审核过程，其中涉及将条目的moderated 值更改为true。虽然它消除了用户响应的即时性，但如果没有立即发布响应，它会降低垃圾邮件发送者等的吸引力。

页面加载时GET 请求随后返回所有moderated: true 条目在显示之前对值进行 HTML 编码 (he) 用 HTML 编码的条目填充 DOM

代码看起来像这样：

POST 请求 - 浏览器

// sanitize the input  
var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(),  SAFE_FOR_JQUERY: true );
var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(),  SAFE_FOR_JQUERY: true );

// validation - make sure input is between 1 and 140 characters
var input_1_text_valid_length = validator.isLength(sanitized_input_1_text,  min: 1, max: 140 );
var input_2_text_valid_length = validator.isLength(sanitized_input_2_text,  min: 1, max: 140 );

// validation - regex to only allow certain characters
// for pattern, see:  https://***.com/q/63895992
var pattern = /^(?!.*([ ,'-])\1)[a-zA-Z]+(?:[ ,'-]+[a-zA-Z]+)*$/;
var input_1_text_valid_characters = validator.matches(sanitized_input_1_text, pattern, "gm");
var input_2_text_valid_characters = validator.matches(sanitized_input_2_text, pattern, "gm");

// if validations pass
if (input_1_text_valid_length === true && input_2_text_valid_length === true && input_1_text_valid_characters === true && input_2_text_valid_characters === true) 

// define parameters to send to database  
var parameters = ;
parameters.input_1_text = sanitized_input_1_text; 
parameters.input_2_text = sanitized_input_2_text; 

// get token from google and send token and input to database
// see:  https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
grecaptcha.ready(function() 
    grecaptcha.execute('site-key-here',  action: 'submit_entry' ).then(function(token) 
        parameters.token = token;
        jquery_ajax_call_to_my_api(parameters);
    );
);

POST 请求 - 服务器

var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
var token = req.body.token;
var url = `https://www.google.com/recaptcha/api/siteverify?secret=$secret_key&response=$token`;

// verify recaptcha token with google
var response = await fetch(url);
var response_json = await response.json();
var score = response_json.score;
var document = ;

// if google's response 'score' is greater than 0.5, 
// see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score  

if (score >= 0.5) 

// perform all the same sanitizations and validations to protect against
// POST requests direct to the API via curl or postman etc  
// if validations pass, add entry to the database with `moderated: false` property   


);

GET 请求 - 浏览器

逻辑：

获取所有具有moderated: true 属性的条目在填充 DOM 之前对值进行 HTML 编码

服务器上的头盔配置

app.use(
    helmet(
        contentSecurityPolicy: 
            directives: 
                defaultSrc: ["'self'"],
                scriptSrc: ["'self'", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
                connectSrc: ["'self'", "https://some-domain.com", "https://some.other.domain.com"],
                styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
                fontSrc: ["'self'", "fonts.gstatic.com"],
                imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:", "https://another-domain.com"],
                frameSrc: ["'self'", "https://www.google.com"]
            
        ,
    )
);

回答我在 OP 中的问题：

我是否应该将值作为 HTML 编码实体添加到 MongoDB 数据库或者“按原样”存储它们并在填充 DOM 之前对其进行编码和他们一起？

只要输入在客户端和服务器上都经过清理和验证，您应该只需要在填充 DOM 之前进行 HTML 编码。

如果要在 MongoDB 中将值保存为 html 实体，是否会这样？使在数据库中搜索内容变得困难，因为搜索例如<h1>hello there!</h1> <a href="">link</a> 不会返回任何结果，因为数据库中的值是 &#x3C;h1&#x3E;hello there!&#x3C;/h1&#x3E; &#x3C;a href=&#x22;&#x22;&#x3E;link&#x3C;/a&#x3E;

我认为如果用 HTML 编码的值填充数据库条目会使它们看起来很混乱，所以我“按原样”存储经过清理、验证的条目。

在我阅读有关保护 Web 表单的内容时，已经说了很多关于客户端的做法是相当多余的，因为任何事情都可以 DOM 中的更改，JavaScript 可以被禁用，请求可以使用 curl 或 postman 直接发送到 API 端点，因此绕过任何客户端方法。

说应该清理（DOMPurify），验证（validator.js）和编码（he）可以执行：1）客户端仅 2) 客户端和服务器端或 3) 仅服务器端？

选项2，清理和验证客户端和服务器上的输入。

【讨论】：

以上是关于在保护评论表单和相关 API 端点时，是不是应该在浏览器、服务器或两者中对输入进行清理、验证和编码？的主要内容，如果未能解决你的问题，请参考以下文章