如何使用库来获取维基百科页面?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何使用库来获取维基百科页面?相关的知识,希望对你有一定的参考价值。
我一直试图弄清楚mwapi
库(MediaWiki API)的文档,我无法弄清楚如何根据搜索查询或关键字简单地请求页面。我知道我应该使用get()
,但用关键字填充参数会产生错误。有谁知道这是如何工作来查找像“地球风和火”这样的东西?
文档可以在这里找到:http://pythonhosted.org/mwapi
这是他们使用get()的唯一例子
import mwapi
session = mwapi.Session('https://en.wikipedia.org')
print(session.get(action='query', meta='userinfo'))
{'query': {'userinfo': {'anon': '', 'name': '75.72.203.28', 'id': 0}}, 'batchcomplete': ''}
print(session.get(action='query', prop='revisions', revids=32423425))
{'query': {'pages': {'1429626': {'ns': 0, 'revisions': [{'user': 'Wknight94', 'parentid': 32276615, 'comment': '/* References */ Removing less-specific cat', 'revid': 32423425, 'timestamp': '2005-12-23T00:07:17Z'}], 'title': 'Grigol Ordzhonikidze', 'pageid': 1429626}}}, 'batchcomplete': ''}
答案
也许这段代码可以帮助您理解API:
import json # Used only to pretty-print dictionaries.
import mwapi
USER_AGENT = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.6) Gecko/2009011913 Firefox'
session = mwapi.Session('https://en.wikipedia.org', user_agent=USER_AGENT)
query = session.get(action='query', titles='Earth Wind and Fire')
print('query returned:')
print(json.dumps(query, indent=4))
pages = query['query']['pages']
if pages:
print('
pages:')
for pageid in pages:
data = session.get(action='parse', pageid=pageid, prop='text')
print(json.dumps(data, indent=4))
输出:
query returned:
{
"batchcomplete": "",
"query": {
"pages": {
"313370": {
"pageid": 313370,
"ns": 0,
"title": "Earth Wind and Fire"
}
}
}
}
pages:
{
"parse": {
"title": "Earth Wind and Fire",
"pageid": 313370,
"text": {
"*": "<div class="redirectMsg"><p>Redirect to:</p><ul class="redirectText"><li><a href="/wiki/Earth,_Wind_%26_Fire" title="Earth, Wind & Fire">Earth, Wind & Fire</a></li></ul></div><div class="mw-parser-output">
<!--
NewPP limit report
Parsed by mw1279
Cached time: 20171121014700
Cache expiry: 1900800
Dynamic content: false
CPU time usage: 0.000 seconds
Real time usage: 0.001 seconds
Preprocessor visited node count: 0/1000000
Preprocessor generated node count: 0/1500000
Postu2010expand include size: 0/2097152 bytes
Template argument size: 0/2097152 bytes
Highest expansion depth: 0/40
Expensive parser function count: 0/500
-->
<!--
Transclusion expansion time report (%,ms,calls,template)
100.00% 0.000 1 -total
-->
</div>
<!-- Saved in parser cache with key enwiki:pcache:idhash:313370-0!canonical and timestamp 20171121014700 and revision id 16182229
-->
"
}
}
}
以上是关于如何使用库来获取维基百科页面?的主要内容,如果未能解决你的问题,请参考以下文章