Ruby on Rails“由于bot而导致的UTF-8中无效的字节序列”

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Ruby on Rails“由于bot而导致的UTF-8中无效的字节序列”相关的知识,希望对你有一定的参考价值。

[我的中国漫游器在滚动我的网站时触发了一些错误:http://www.easou.com/search/spider.html

我的应用程序的版本都使用Ruby 1.9.3和Rails 3.2.X

这里是一个堆栈跟踪:

An ArgumentError occurred in listings#show:

  invalid byte sequence in UTF-8
  rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params'


-------------------------------
Request:
-------------------------------

  * URL       : http://www.my-website.com
  * IP address: X.X.X.X
  * Parameters: "action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"
  * Rails root: /.../releases/20140708150222
  * Timestamp : 2014-07-09 02:57:43 +0200

-------------------------------
Backtrace:
-------------------------------

  rack (1.4.5) lib/rack/utils.rb:104:in `normalize_params'
  rack (1.4.5) lib/rack/utils.rb:96:in `block in parse_nested_query'
  rack (1.4.5) lib/rack/utils.rb:93:in `each'
  rack (1.4.5) lib/rack/utils.rb:93:in `parse_nested_query'
  rack (1.4.5) lib/rack/request.rb:332:in `parse_query'
  actionpack (3.2.18) lib/action_dispatch/http/request.rb:275:in `parse_query'
  rack (1.4.5) lib/rack/request.rb:209:in `POST'
  actionpack (3.2.18) lib/action_dispatch/http/request.rb:237:in `POST'
  actionpack (3.2.18) lib/action_dispatch/http/parameters.rb:10:in `parameters'

-------------------------------
Session:
-------------------------------

  * session id: nil
  * data: 

-------------------------------
Environment:
-------------------------------

  * CONTENT_LENGTH                                 : 514
  * CONTENT_TYPE                                   : application/x-www-form-urlencoded
  * HTTP_ACCEPT                                    : text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
  * HTTP_ACCEPT_ENCODING                           : gzip, deflate
  * HTTP_ACCEPT_LANGUAGE                           : zh;q=0.9,en;q=0.8
  * HTTP_CONNECTION                                : close
  * HTTP_HOST                                      : www.my-website.com
  * HTTP_REFER                                     : http://www.my-website.com/
  * HTTP_USER_AGENT                                : Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html)
  * ORIGINAL_FULLPATH                              : /
  * PASSENGER_APP_SPAWNER_IDLE_TIME                : -1
  * PASSENGER_APP_TYPE                             : rack
  * PASSENGER_CONNECT_PASSWORD                     : [FILTERED]
  * PASSENGER_DEBUGGER                             : false
  * PASSENGER_ENVIRONMENT                          : production
  * PASSENGER_FRAMEWORK_SPAWNER_IDLE_TIME          : -1
  * PASSENGER_FRIENDLY_ERROR_PAGES                 : true
  * PASSENGER_GROUP                                :
  * PASSENGER_MAX_REQUESTS                         : 0
  * PASSENGER_MIN_INSTANCES                        : 1
  * PASSENGER_SHOW_VERSION_IN_HEADER               : true
  * PASSENGER_SPAWN_METHOD                         : smart-lv2
  * PASSENGER_USER                                 :
  * PASSENGER_USE_GLOBAL_QUEUE                     : true
  * PATH_INFO                                      : /
  * QUERY_STRING                                   :
  * REMOTE_ADDR                                    : 183.60.212.153
  * REMOTE_PORT                                    : 52997
  * REQUEST_METHOD                                 : GET
  * REQUEST_URI                                    : /
  * SCGI                                           : 1
  * SCRIPT_NAME                                    :
  * SERVER_PORT                                    : 80
  * SERVER_PROTOCOL                                : HTTP/1.1
  * SERVER_SOFTWARE                                : nginx/1.2.6
  * UNION_STATION_SUPPORT                          : false
  * _                                              : _
  * action_controller.instance                     : listings#show
  * action_dispatch.backtrace_cleaner              : #<Rails::BacktraceCleaner:0x000000056e8660>
  * action_dispatch.cookies                        : #<ActionDispatch::Cookies::CookieJar:0x00000006564e28>
  * action_dispatch.logger                         : #<ActiveSupport::TaggedLogging:0x0000000318aff8>
  * action_dispatch.parameter_filter               : [:password, /RAW_POST_DATA/, /RAW_POST_DATA/, /RAW_POST_DATA/]
  * action_dispatch.remote_ip                      : 183.60.212.153
  * action_dispatch.request.content_type           : application/x-www-form-urlencoded
  * action_dispatch.request.parameters             : "action"=>"show", "controller"=>"listings", "id"=>"location-t7-villeurbanne--58"
  * action_dispatch.request.path_parameters        : :action=>"show", :controller=>"listings", :id=>"location-t7-villeurbanne--58"
  * action_dispatch.request.query_parameters       : 
  * action_dispatch.request.request_parameters     : 
  * action_dispatch.request.unsigned_session_cookie: 
  * action_dispatch.request_id                     : 9f8afbc8ff142f91ddbd9cabee3629f3
  * action_dispatch.routes                         : #<ActionDispatch::Routing::RouteSet:0x0000000339f370>
  * action_dispatch.show_detailed_exceptions       : false
  * action_dispatch.show_exceptions                : true
  * rack-cache.allow_reload                        : false
  * rack-cache.allow_revalidate                    : false
  * rack-cache.cache_key                           : Rack::Cache::Key
  * rack-cache.default_ttl                         : 0
  * rack-cache.entitystore                         : rails:/
  * rack-cache.ignore_headers                      : ["Set-Cookie"]
  * rack-cache.metastore                           : rails:/
  * rack-cache.private_headers                     : ["Authorization", "Cookie"]
  * rack-cache.storage                             : #<Rack::Cache::Storage:0x000000039c5768>
  * rack-cache.use_native_ttl                      : false
  * rack-cache.verbose                             : false
  * rack.errors                                    : #<IO:0x000000006592a8>
  * rack.input                                     : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0>
  * rack.multiprocess                              : true
  * rack.multithread                               : false
  * rack.request.cookie_hash                       : 
  * rack.request.form_hash                         :
  * rack.request.form_input                        : #<PhusionPassenger::Utils::RewindableInput:0x0000000655b3a0>
  * rack.request.form_vars                         : ���W�"��陷q�B��)���
�F��P   Z� 8�� &   G\y�P��u�T ed �.�%�mxEAẳ\�d*�Hg�     �C賳�lj��� � U 1��]pgt�P�
  Ɗ    ��c"� ��LX��D���HR�y��p`6�l���lN�P �l�S����`V4y��c����X2�        &JO!��*p �l��-�гU��w g�ԍk�� (� F J��  q�:�5G�Jh�pί����ࡃ]                                                                                                                                                                                                                                                                           �z�h���� d �
  * rack.request.query_hash                        : 
  * rack.request.query_string                      :
  * rack.run_once                                  : false
  * rack.session                                   : 
  * rack.session.options                           : :path=>"/", :domain=>nil, :expire_after=>nil, :secure=>false, :httponly=>true, :defer=>false, :renew=>false, :coder=>#<Rack::Session::Cookie::Base64::Marshal:0x000000034d4ad8>, :id=>nil
  * rack.url_scheme                                : http
  * rack.version                                   : [1, 0]

如您所见,URL中没有无效的utf-8,只有rack.request.form_vars中存在。我每天大约有一百个错误,所有与此类似。

所以,我试图用类似这样的方法在rack.request.form_vars中强制utf-8:

class RackFormVarsSanitizer
  def initialize(app)
    @app = app
  end

  def call(env)
    if env["rack.request.form_vars"] 
      env["rack.request.form_vars"] = env["rack.request.form_vars"].force_encoding('UTF-8')
    end
    @app.call(env)
  end
end

我在application.rb中称呼它:

config.middleware.use "RackFormVarsSanitizer"

似乎不起作用,因为我已经有错误。问题是我无法在开发模式下进行测试,因为我不知道如何设置rack.request.form_vars

我安装了utf8-cleaner gem,但无法修复。

有人有解决此问题的想法?还是在开发中触发它?

答案

所以您不必在其他答复中拼凑评论,这就是我现在正在做的-24小时内我没有看到任何错误,因此看起来很有希望:

rack-utf8_sanitizer添加到您的Gemfile:

gem 'rack-utf8_sanitizer'

并运行

bundle

this middleware放入app/middleware/handle_invalid_percent_encoding.rb并重命名类HandleInvalidPercentEncoding(因为ExceptionApp有点太笼统)。

configconfig/application.rb块中执行:

require "#Rails.root/app/middleware/handle_invalid_percent_encoding.rb"


# NOTE: These must be in this order relative to each other.
# HandleInvalidPercentEncoding just raises for encoding errors it doesn't cover,
# so it must run after (= be inserted before) Rack::UTF8Sanitizer.
config.middleware.insert 0, HandleInvalidPercentEncoding
config.middleware.insert 0, Rack::UTF8Sanitizer  # from a gem

部署。完成。

([app恰好是我正在处理的项目中中间件的位置,但我可能更喜欢lib。无论如何。都应该起作用。]

另一答案

将此行添加到Gemfile,然后在终端中运行bundle

gem "handle_invalid_percent_encoding_requests"

此解决方案基于Henrik's answer,转换为a Rails Engine gem

另一答案

[宝石仓库中有an issue,带有指向someone's possible solution的链接–他们说它适用于他们,但他们不确定这是否是一个好的解决方案。

我还没有尝试过,但是我想会。

以上是关于Ruby on Rails“由于bot而导致的UTF-8中无效的字节序列”的主要内容,如果未能解决你的问题,请参考以下文章

ruby on rails - 问题捆绑安装nokogiri 1.7.2 on ruby on rails 4.x.

Ruby on Rails入门篇

markdown [rails:devise] Ruby on Rails的身份验证gem。 #ruby #rails

ruby Ruby on Rails:常见路由

ruby Ruby on rails类

Ruby on Rails 開發秘籍 | Ruby on Rails 快速入門