python 来自http://knarfeh.com/2016/03/11/leetcode-%E7%AC%94%E8%AE%B0%E8%AF%B4%E6%98%8E/

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python 来自http://knarfeh.com/2016/03/11/leetcode-%E7%AC%94%E8%AE%B0%E8%AF%B4%E6%98%8E/相关的知识,希望对你有一定的参考价值。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import urllib2
import re
from bs4 import BeautifulSoup

import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )

leetcode_md = u"""title: "%s"
date: 2014-03-11 00:33:34
tags: [algorithms, leetcode, %s]
---

### 描述
---
这里是描述
<!--more-->

### 分析
---
这里是分析

### 解决方案1(C++)
---
### 解决方案2(Java)
---
### 解决方案3(Python)
---

### 相关问题
---
%s
### [题目来源](%s)
"""

def get_tag_content(tag):
    u"""
    用于提取bs中tag.contents的内容
    """
    return "".join([unicode(x) for x in tag.contents])

def get_attr(dom, attr, defaultValue=""):
    u"""
    获取bs中tag.content的指定属性
    若content为空或者没有指定属性则返回默认值
    """
    if dom is None:
        return defaultValue
    return dom.get(attr, defaultValue)

leetcode_problems = 'https://leetcode.com/problemset/algorithms/'

html = urllib2.urlopen(leetcode_problems)
content = html.read()
soup = BeautifulSoup(content, 'lxml')

problem_list = soup.select('table.table tbody tr')
# print problem_list

for item in problem_list[:5]:
    soup = BeautifulSoup(str(item), 'lxml')

    problem_id = get_tag_content(soup.select('td')[1])
    problem_name = get_tag_content(soup.select('td a')[0]).replace(' ', '-')
    href = get_attr(soup.select('td a')[0], 'href')
    problem_href = 'https://leetcode.com' + href

    filename = 'leetcode-' + str(problem_id) + '-' + str(problem_name) + ".md"
    problem_name_md = 'leetcode-' + str(problem_id) + '-' + str(problem_name)
    html = urllib2.urlopen(problem_href)
    content = html.read()

    soup = BeautifulSoup(content, 'lxml')

    problem_tag_list = []
    similar_problem_list = []
    if len(soup.select('span.hidebutton')) > 0:
        problem_tag_list = soup.select('span.hidebutton')[0].select('a')
    if len(soup.select('span.hidebutton')) > 1:
        similar_problem_list = soup.select('span.hidebutton')[1].select('a')
    tags = []
    for tag_item in problem_tag_list:
        soup = BeautifulSoup(str(tag_item), 'lxml')
        tag = get_tag_content(soup.select('a')[0]).strip().replace(' ', '-')
        tags.append(tag)

    similar_problem = {}

    for similar_item in similar_problem_list:
        soup = BeautifulSoup(str(similar_item), 'lxml')
        similar_problem_name = get_tag_content(soup.select('a')[0]).strip()
        href = get_attr(soup.select('a')[0], 'href')
        similar_problem_href = 'https://leetcode.com' + href
        similar_problem[similar_problem_name] = similar_problem_href

    title = problem_name_md
    md_tags = ', '.join(tags)
    similar_problem_md = ''
    for key, value in similar_problem.items():
        similar_problem_md += ('['+key+']'+'('+value+')   \n')     # 加的两个空格是为了在md中显示换行
    now_leetcode_md = leetcode_md % (title, md_tags, similar_problem_md, problem_href)

    print(u"完成" + filename)
    f = open(filename, 'w')
    f.write(now_leetcode_md)
    f.close()

以上是关于python 来自http://knarfeh.com/2016/03/11/leetcode-%E7%AC%94%E8%AE%B0%E8%AF%B4%E6%98%8E/的主要内容,如果未能解决你的问题,请参考以下文章

python 来自Python的os文件系统

一封来自“Python”的信

来自 Black Hat Python 书的 Python 嗅探

“路径 python3(来自 --python=python3)不存在”错误

python Python装饰模板(来自“Head First Python ed.2”)

来自嵌套字典的 Python 数据类