celery的使用

Posted 李雷雷

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了celery的使用相关的知识,希望对你有一定的参考价值。

1.celery的任务调度

# -*- coding: utf-8 -*-
import threading

from bs4 import BeautifulSoup
from tornado import httpclient
from celery import Celery
from tornado.httpclient import HTTPClient

broker = ‘redis://localhost:6379‘
backend = ‘redis://localhost:6379‘

app = Celery(‘tasks‘, broker=broker, backend=backend)

visited = {}


@app.task
def get_html(url):
    http_client = HTTPClient()
    try:
        response = http_client.fetch(url, follow_redirects=True)
        return response.body
    except httpclient.HTTPError as e:
        return None
    finally:
        http_client.close()


def start(url):
    threads = []
    for i in range(20):
        t = threading.Thread(target=schedule, args=(url,))
        t.daemon = True
        t.start()
        threads.append(t)

    for thread in threads:
        thread.join()


def process_html(url, html):
    print url + ": " + html
    _add_links_to_queue(url, html)


def schedule(url):
    print "before call _work " + url
    _worker.delay(url)
    print "after call _work " + url


def _add_links_to_queue(url, html):
    soup = BeautifulSoup(html)
    links = soup.find_all(‘a‘)
    for link in links:
        try:
            _url = link[‘href‘]
        except:
            pass

        if not _url.startswith(‘http‘):
            _url = ‘http://‘ + _url
        print url + "==>" + _url
        schedule(_url)


@app.task
def _worker(url):
    print str(threading.currentThread()) + " running " + url
    while 1:
        if url in visited:
            continue
        result = get_html.delay(url)
        try:
            html = result.get(timeout=5)
        except Exception as e:
            print(url)
            print(e)
        finally:
            process_html(url, html)
            visited[url] = True


if __name__ == ‘__main__‘:
    start("http://www.hao123.com/")

  

2.celery如何进行负载均衡设计

celery有send_task方式去做任务调度,因此,负载均衡的话,可以采用自己的算法去做任务分配,可参考:http://blog.csdn.net/vintage_1/article/details/47664187

以上是关于celery的使用的主要内容,如果未能解决你的问题,请参考以下文章

在 Celery 中使用 Python 标准日志记录

django入门 celery使用

celery beat 没有发送消息(使用 django-celery-beat)

Python爬虫之使用celery加速爬虫

python 一些代码使用模式作为SQL-Alchemy的声明基础,以及对Celery分支的支持。

django+celery实现异步任务