Tornado学习记录四

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Tornado学习记录四相关的知识,希望对你有一定的参考价值。

Structure of a Tornado web application

A Tornado web application generally consists of one or more RequestHandler subclasses, an Application object which routes incoming requests to handlers, and a main() function to start the server.

tornado 应用通常包括至少一个RequestHandler 子类, 一个 Application 对象用来将请求路由至 handlers, 一个 main 函数用来启动 server.

hello world

import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello, world")

def make_app():
    return tornado.web.Application([
        (r"/", MainHandler),
    ])

if __name__ == "__main__":
    app = make_app()
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()

 

The Application object

The Application object is responsible for global configuration, including the routing table that maps requests to handlers.

Application 对象负责全局配置, 尤其是将请求转发到 handlers 的路由表.

The routing table is a list of URLSpec objects (or tuples), each of which contains (at least) a regular expression and a handler class. Order matters; the first matching rule is used. If the regular expression contains capturing groups, these groups are the path arguments and will be passed to the handler’s HTTP method. If a dictionary is passed as the third element of the URLSpec, it supplies the initialization arguments which will be passed to RequestHandler.initialize. Finally, the URLSpec may have a name, which will allow it to be used with RequestHandler.reverse_url.

路由表是一个列表, 包含 URLSpec 对象或元组, 每一个对象或元组, 至少包含一个正则表达式和一个 handler 类, 他们的先后顺序很 matters, 就像是遇到的许多姑娘, 出场的先后顺序很重要, 很多姑娘如果换一个时间认识, 就会有不同的结局, 正则表达式也是一样, 先匹配先使用. 如果正则表达式包含 capturing groups, 这些 groups 就是 path argument, 会被传递到 handler‘s HTTP method 中. 如果 URLSpec 的第三个参数还包含 dict, 它就支持初始化 arguments, 会被传递到 RequestHandler.initialize. 最后, URLSpec 或许会有个 name, RequestHandler.reverse_url 会用到它.

For example, in this fragment the root URL / is mapped to MainHandler and URLs of the form/story/ followed by a number are mapped to StoryHandler. That number is passed (as a string) to StoryHandler.get.

栗子, root URL / 路由到 MainHandler, URLs /story/([0-9]+) 路由到  StoryHandler, 数字会作为字符串被传递到 StoryHandler.get.

简单说, 就是访问 /, 会看到一个连接, link to story 1, 点击这个连接, 会访问 /story/1, 注意 self.reverse_url 使用了两个参数.

url(r"/story/([0-9]+)", StoryHandler, dict(db=db), name="story")

因为有 capturing groups, 匹配到的项会作为参数传递到 Handler HTTP method 中, get post 等都有, 在这里就是 story_id

因为有 dict(db=db), 所以 db 被传递到了 RequestHandler.initialize, 但它是个对象方法, 不是类方法

因为由 name=‘story‘, 所以 reverse_url 会把 story 作为参数.

class MainHandler(RequestHandler):
    def get(self):
        self.write(<a href="%s">link to story 1</a> %
                   self.reverse_url("story", "1"))

class StoryHandler(RequestHandler):
    def initialize(self, db):
        self.db = db

    def get(self, story_id):
        self.write("this is story %s" % story_id)

app = Application([
    url(r"/", MainHandler),
    url(r"/story/([0-9]+)", StoryHandler, dict(db=db), name="story")
    ])

 

The Application constructor takes many keyword arguments that can be used to customize the behavior of the application and enable optional features; see Application.settings for the complete list.

Application 构造器可以有很多参数用来定制应用的行为, 开启可选的特性, 去 Application.settings 能找到完整的列表.

 

Subclassing RequestHandler

Most of the work of a Tornado web application is done in subclasses of RequestHandler. The main entry point for a handler subclass is a method named after the HTTP method being handled: get(),post(), etc. Each handler may define one or more of these methods to handle different HTTP actions. As described above, these methods will be called with arguments corresponding to the capturing groups of the routing rule that matched.

Tornado web 应用的大部分工作由 RequestHandler 的子类完成. 主入口是与 HTTP 方法相同命名的 get(), post() 等. 每个 handler 可以定义多种方法来完成 HTTP 请求的操作. 这些方法可以传入参数, 即被路由规则中正则表达式 capturing groups 的那些.

Within a handler, call methods such as RequestHandler.render or RequestHandler.write to produce a response. render() loads a Template by name and renders it with the given arguments. write() is used for non-template-based output; it accepts strings, bytes, and dictionaries (dicts will be encoded as JSON).

在一个 handler 中, 调用 RequestHandler.render 或 RequestHandler.write 来产生一个 response. render() 加载一个 Template, 并且按照给定参数渲染它. write() 用来回吐不需要模板的数据, 比如 strings, bytes, dicts, 字典会被编码成 JSON 格式. 我又想到 Python 恶心的编码格式了.

Many methods in RequestHandler are designed to be overridden in subclasses and be used throughout the application. It is common to define a BaseHandler class that overrides methods such as write_error and get_current_user and then subclass your own BaseHandler instead of RequestHandler for all your specific handlers.

RequestHandler 中的许多方法可以在子类中重载, 常见的做法有定一个重载诸如 write_error 和 get_current_user 这些方法后形成一个 BaseHandler, 然后使用它的子类.

 

Handling request input

The request handler can access the object representing the current request with self.request. See the class definition for HTTPServerRequest for a complete list of attributes.

Request data in the formats used by html forms will be parsed for you and is made available in methods like get_query_argument and get_body_argument.

当前请求可以由 self.request 访问, 请求的参数可以由 get_query_argument 和 get_body_argument 访问, 分别对应 get, post 方法.

class MyFormHandler(tornado.web.RequestHandler):
    def get(self):
        self.write(<html><body><form action="/myform" method="POST">
                   <input type="text" name="message">
                   <input type="submit" value="Submit">
                   </form></body></html>)

    def post(self):
        self.set_header("Content-Type", "text/plain")
        self.write("You wrote " + self.get_body_argument("message"))

Since the HTML form encoding is ambiguous as to whether an argument is a single value or a list with one element, RequestHandler has distinct methods to allow the application to indicate whether or not it expects a list. For lists, use get_query_arguments and get_body_arguments instead of their singular counterparts.

RequestHandler 有不同方法允许应用指定期待的参数是否是一个值, 还是一个 list. 如果是 list, 就使用 get_query_arguments 和 get_body_arguments

Files uploaded via a form are available in self.request.files, which maps names (the name of the HTML <input type="file"> element) to a list of files. Each file is a dictionary of the form{"filename":..., "content_type":..., "body":...}. The files object is only present if the files were uploaded with a form wrapper (i.e. a multipart/form-data Content-Type); if this format was not used the raw uploaded data is available in self.request.body. By default uploaded files are fully buffered in memory; if you need to handle files that are too large to comfortably keep in memory see the stream_request_body class decorator.

上传文件要用到 self.request.files, 每个文件都是一个字典, {‘filename‘: ..., ‘content_type‘: ..., ‘body‘: ...}. files 的存在与否与 content-type 有关, 如果 format 类型没有指定, 就要去 self.request.body 里面去找这个 file. 默认情况下上传的文件会全量 buffer 在内存里, 如果文件很大, 那么你可能要用到流. 流作为高级特性, 无论对于 nodejs 还是 tornado, 都有种高大上的感觉.

Due to the quirks of the HTML form encoding (e.g. the ambiguity around singular versus plural arguments), Tornado does not attempt to unify form arguments with other types of input. In particular, we do not parse JSON request bodies. Applications that wish to use JSON instead of form-encoding may override prepare to parse their requests:

tornado 不解析 JSO request bodies, 想要使用 JSON 的需要自己重载 prepare 函数

def prepare(self):
    if self.request.headers["Content-Type"].startswith("application/json"):
        self.json_args = json.loads(self.request.body)
    else:
        self.json_args = None

 

Overriding RequestHandler methods

In addition to get()/post()/etc, certain other methods in RequestHandler are designed to be overridden by subclasses when necessary. On every request, the following sequence of calls takes place:

  1. A new RequestHandler object is created on each request
  2. initialize() is called with the initialization arguments from the Application configuration.initialize should typically just save the arguments passed into member variables; it may not produce any output or call methods like send_error.
  3. prepare() is called. This is most useful in a base class shared by all of your handler subclasses, as prepare is called no matter which HTTP method is used. prepare may produce output; if it calls finish (or redirect, etc), processing stops here.
  4. One of the HTTP methods is called: get()post()put(), etc. If the URL regular expression contains capturing groups, they are passed as arguments to this method.
  5. When the request is finished, on_finish() is called. For synchronous handlers this is immediately after get() (etc) return; for asynchronous handlers it is after the call to finish().

除了get, post, RequestHandler 还有些设计为被子类在需要时重载的方法. 在每一次请求中, 会发生如下的故事.

  1. 每次请求都会创建一个新的 RequestHandler 对象
  2. 按照 Application 配置(第三个参数 ,dict(xx=xx))调用 initialize(self, xx). 典型情况下, initialize 就是存储一下传递给它的变量, 可能不会产生输出, 或者调用 send_error 这些方法
  3. 调用 prepare(). 任何 HTTP 方法都会调用这个函数, prepare 可能会产生输出, 如果它调用 finish 或 redirect 等, 请求的处理就停止了.
  4. 调用 HTTP 方法, get(), post(), put() 等, 如果正则有捕获, 会把捕获作为参数传递给这个方法.
  5. 当请求处理完成, 会调用 on_finish(). 同步 handlers 在 get() 这些方法返回之后会立即调用, 异步 handlers 在调用 finish() 结束之后 才调用 on_finish()

All methods designed to be overridden are noted as such in the RequestHandler documentation. Some of the most commonly overridden methods include:

最常被重载的方法:

  • write_error  输出错误页面的 HTML.
  • on_connection_close  在 client 断开连接是调用, 应用可以选择侦测这个事件, 并停止进一步的处置. 不保证连接断开后立即会被侦测到.
  • get_current_user
  • get_user_locale  返回当前用户的 Locale 对象
  • set_default_headers  可以用来设置 response 中额外的 headers

 

Error Handling

If a handler raises an exception, Tornado will call RequestHandler.write_error to generate an error page. tornado.web.HTTPError can be used to generate a specified status code; all other exceptions return a 500 status.

如果 handler 抛出一个异常, tornado 会调用 RequestHandler.write_error 生成错误页面. tornado.web.HTTPError 可以用来生成特定的 status code, 默认是 500. server internal error

The default error page includes a stack trace in debug mode and a one-line description of the error (e.g. “500: Internal Server Error”) otherwise. To produce a custom error page, overrideRequestHandler.write_error (probably in a base class shared by all your handlers). This method may produce output normally via methods such as write and render. If the error was caused by an exception, an exc_info triple will be passed as a keyword argument (note that this exception is not guaranteed to be the current exception in sys.exc_info, so write_error must use e.g.traceback.format_exception instead of traceback.format_exc).

debug mode 下默认错误页面包含一个 stack trace 和一行错误描述(比如, ‘500: Internal Server Error‘). 重载 RequestHandler.write_error 生成定制的错误页面. write_error 中可以调用 write 和 render 方法输出 response. 如果这个 error 是由异常导致, 错误信息会存储在 exc_info 关键字中. write_err 需要使用 traceback.format_exception. 没有栗子, 看不懂啊.

It is also possible to generate an error page from regular handler methods instead of write_error by calling set_status, writing a response, and returning. The special exception tornado.web.Finish may be raised to terminate the handler without calling write_error in situations where simply returning is not convenient.

在 handler method 中调用 set_status 也可以生成错误页面. 特殊异常 tornado.web.Finish 可以用来终止 handler.

For 404 errors, use the default_handler_class Application setting. This handler should override prepare instead of a more specific method like get() so it works with any HTTP method. It should produce its error page as described above: either by raising a HTTPError(404) and overriding write_error, or calling self.set_status(404) and producing the response directly in prepare().

对于404错误, 使用 default_handler_class Application setting. 这个 handler 应该重载 prepare 而不是某个具体的 HTTP 方法. 生成错误界面的方法如上所述, 在prepare中抛出一个 HTTPError(404) 并 重载 write_error, 或调用 self.set_status(404) 直接生成 response.

 

Redirection

There are two main ways you can redirect requests in Tornado: RequestHandler.redirect and with the RedirectHandler.

两种方法可以重定向请求, RequestHander.redirect 和 使用 RedirectHandler 类.

You can use self.redirect() within a RequestHandler method to redirect users elsewhere. There is also an optional parameter permanent which you can use to indicate that the redirection is considered permanent. The default value of permanent is False, which generates a 302 Found HTTP response code and is appropriate for things like redirecting users after successful POST requests. If permanent is true, the 301 Moved Permanently HTTP response code is used, which is useful for e.g. redirecting to a canonical URL for a page in an SEO-friendly manner.

可以在 RequestHandler 中使用 self.redirect() 来重定向. 有一个可选参数 permanent, 设定这个重定向是永久的. 默认 permanent=False, 适合在像成功处理完 POST 请求后, 使用重定向生成 302 Found HTTP response code 等场景. 如果 permanent=True, 会生成 301 Moved Permanently HTTP response code. 这里说的好处是 SEO-friendly, 不太理解.

RedirectHandler lets you configure redirects directly in your Application routing table. For example, to configure a single static redirect:

RedirectHandler 让你可以在 Application 路由表中直接重定向. 

app = tornado.web.Application([
    url(r"/app", tornado.web.RedirectHandler,
        dict(url="http://itunes.apple.com/my-app-id")),
    ])

RedirectHandler also supports regular expression substitutions. The following rule redirects all requests beginning with /pictures/ to the prefix /photos/ instead:

RedirectHandler 也支持正则替换. 下面的规则, 把所有以 /picture/ 开头的请求代理到 /photos/, 默认访问 /photos/1

app = tornado.web.Application([
    url(r"/photos/(.*)", MyPhotoHandler),
    url(r"/pictures/(.*)", tornado.web.RedirectHandler,
        dict(url=r"/photos/\1")),
    ])

Unlike RequestHandler.redirectRedirectHandler uses permanent redirects by default. This is because the routing table does not change at runtime and is presumed to be permanent, while redirects found in handlers are likely to be the result of other logic that may change. To send a temporary redirect with a RedirectHandler, add permanent=False to the RedirectHandler initialization arguments.

RedirectHandler 默认使用永久重定向. 因为运行期间路由表并不会变化, 而 redirect 方法要取决与其他可能变化的逻辑. 在 RedirectHandler 初始化语句中增加 permanent=False, 可以让 RedirectHandler 发送一个临时重定向.

 

Asynchronous handlers

Tornado handlers are synchronous by default: when the get()/post() method returns, the request is considered finished and the response is sent. Since all other requests are blocked while one handler is running, any long-running handler should be made asynchronous so it can call its slow operations in a non-blocking way. This topic is covered in more detail in Asynchronous and non-Blocking I/O; this section is about the particulars of asynchronous techniques in RequestHandler subclasses.

Tornado handlers 默认是同步的, 当 get()/post() 方法返回后, 请求应该结束然后发送 response. 当一个 handler 运行时, 其他的请求是阻塞的. 所以一个长时间运行的 handler 应该是异步的, 非阻塞执行较慢的操作. 本节只介绍 RequestHandler 子类中的异步机制.

The simplest way to make a handler asynchronous is to use the coroutine decorator. This allows you to perform non-blocking I/O with the yield keyword, and no response will be sent until the coroutine has returned. See Coroutines for more details.

让一个 handler 异步最简单的方法是使用 coroutine 装饰器. 使用 yield 关键字进行 非阻塞IO, 在 coroutine 执行完成之前不会发送 response.

In some cases, coroutines may be less convenient than a callback-oriented style, in which case the tornado.web.asynchronous decorator can be used instead. When this decorator is used the response is not automatically sent; instead the request will be kept open until some callback calls RequestHandler.finish. It is up to the application to ensure that this method is called, or else the user’s browser will simply hang.

在一些情况下, coroutine 可能不如回调方便, 就可以使用 tornado.web.asynchronous 装饰器. 在这个装饰器下, 不会自动发送 response, 请求会保持 open 直到回调函数调用了 RequestHandler.finish. 应用应该保证会调用 finish 这个方法, 不然用户的浏览器就夯住了.

Here is an example that makes a call to the FriendFeed API using Tornado’s built-in AsyncHTTPClient:

class MainHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def get(self):
        http = tornado.httpclient.AsyncHTTPClient()
        http.fetch("http://friendfeed-api.com/v2/feed/bret",
                   callback=self.on_response)

    def on_response(self, response):
        if response.error: raise tornado.web.HTTPError(500)
        json = tornado.escape.json_decode(response.body)
        self.write("Fetched " + str(len(json["entries"])) + " entries "
                   "from the FriendFeed API")
        self.finish()

get 方法中, fetch 异步请求 url, 获得请求后执行回调函数 self.on_response, self.on_response 判断 response 是否为空, write 并执行 finish.

tornado.web.asynchronous 装饰的只是 get. 

When get() returns, the request has not finished. When the HTTP client eventually calls on_response(), the request is still open, and the response is finally flushed to the client with the call to self.finish().

当 get() 返回的时候, 请求并未执行结束. 当 on_response() 最终被调用时, request 还是 open 的, 在调用 self.finish() 之后, response 被发送到 client.

For comparison, here is the same example using a coroutine:

class MainHandler(tornado.web.RequestHandler):
    @tornado.gen.coroutine
    def get(self):
        http = tornado.httpclient.AsyncHTTPClient()
        response = yield http.fetch("http://friendfeed-api.com/v2/feed/bret")
        json = tornado.escape.json_decode(response.body)
        self.write("Fetched " + str(len(json["entries"])) + " entries "
                   "from the FriendFeed API")

相同功能的协程版, 同样的代码还是同步代码看着清晰啊.

For a more advanced asynchronous example, take a look at the chat example application, which implements an AJAX chat room using long polling. Users of long polling may want to override on_connection_close() to clean up after the client closes the connection (but see that method’s docstring for caveats).

以上是关于Tornado学习记录四的主要内容,如果未能解决你的问题,请参考以下文章

验证码逆向专栏某验四代文字点选验证码逆向分析

Tornado学习记录二

Tornado学习记录四

Tornado学习记录五

实验四代码评审

Tornado框架实现图形验证码功能