如何在python中的id =“ firstheading”之后抓取网页上的所有信息?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在python中的id =“ firstheading”之后抓取网页上的所有信息?相关的知识,希望对你有一定的参考价值。
我正在尝试从第一个标题之后的网页(使用python)中抓取所有文本。该标题的标签是:<h1 id="firstHeading" class="firstHeading" lang="en">Albert Einstein</h1>
在此标题之前,我不需要任何信息。我要删除此标题之后写的所有文本。我可以在python中使用BeautifulSoup吗?
我正在运行以下代码:`*
import requests
import bs4
from bs4 import BeautifulSoup
urlpage = 'https://en.wikipedia.org/wiki/Albert_Einstein#Publications'
res = requests.get(urlpage)
soup1 = (bs4.BeautifulSoup(res.text, 'lxml')).get_text()
print(soup1)
`*
该网页具有以下信息:
Albert Einstein - Wikipedia
document.documentElement.className="client-js";RLCONF={"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Albert_Einstein","wgTitle":"Albert Einstein","wgCurRevisionId":920687884,"wgRevisionId":920687884,"wgArticleId":736,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages with missing ISBNs","Webarchive template wayback links","CS1 German-language sources (de)","CS1: Julian–Gregorian uncertainty","CS1 French-language sources (fr)","CS1 errors: missing periodical","CS1: long volume value","Wikipedia indefinitely semi-protected pages","Use American English from February 2019","All Wikipedia articles written in American English","Articles with short description","Good articles","Articles containing German-language text","Biography with signature","Articles with hCards","Articles with hAudio microformats","All articles with unsourced statements",
"Articles with unsourced statements from July 2019","Commons category link from Wikidata","Articles with Wikilivres links","Articles with Curlie links","Articles with Project Gutenberg links","Articles with Internet Archive links","Articles with LibriVox links","Use dmy dates from August 2019","Wikipedia articles with BIBSYS identifiers","Wikipedia articles with BNE identifiers","Wikipedia articles with BNF identifiers","Wikipedia articles with GND identifiers","Wikipedia articles with HDS identifiers","Wikipedia articles with ISNI identifiers","Wikipedia articles with LCCN identifiers","Wikipedia articles with LNB identifiers","Wikipedia articles with MGP identifiers","Wikipedia articles with NARA identifiers","Wikipedia articles with NCL identifiers","Wikipedia articles with NDL identifiers","Wikipedia articles with NKC identifiers","Wikipedia articles with NLA identifiers","Wikipedia articles with NLA-person identifiers","Wikipedia articles with NLI identifiers",
"Wikipedia articles with NLR identifiers","Wikipedia articles with NSK identifiers","Wikipedia articles with NTA identifiers","Wikipedia articles with SBN identifiers","Wikipedia articles with SELIBR identifiers","Wikipedia articles with SNAC-ID identifiers","Wikipedia articles with SUDOC identifiers","Wikipedia articles with ULAN identifiers","Wikipedia articles with VIAF identifiers","Wikipedia articles with WorldCat-VIAF identifiers","AC with 25 elements","Wikipedia articles with suppressed authority control identifiers","Pages using authority control with parameters","Articles containing timelines","Pantheists","Spinozists","Albert Einstein","1879 births","1955 deaths","20th-century American engineers","20th-century American writers","20th-century German writers","20th-century physicists","American agnostics","American inventors","American letter writers","American pacifists","American people of German-Jewish descent","American physicists","American science writers",
"American socialists","American Zionists","Ashkenazi Jews","Charles University in Prague faculty","Corresponding Members of the Russian Academy of Sciences (1917–25)","Cosmologists","Deaths from abdominal aortic aneurysm","Einstein family","ETH Zurich alumni","ETH Zurich faculty","German agnostics","German Jews","German emigrants to Switzerland","German Nobel laureates","German inventors","German physicists","German socialists","European democratic socialists","Institute for Advanced Study faculty","Jewish agnostics","Jewish American scientists","Jewish emigrants from Nazi Germany to the United States","Jews who emigrated to escape Nazism","Jewish engineers","Jewish inventors","Jewish philosophers","Jewish physicists","Jewish socialists","Leiden University faculty","Foreign Fellows of the Indian National Science Academy","Foreign Members of the Royal Society","Members of the American Philosophical Society","Members of the Bavarian Academy of Sciences","Members of the Lincean Academy"
,"Members of the Royal Netherlands Academy of Arts and Sciences","Members of the United States National Academy of Sciences","Honorary Members of the USSR Academy of Sciences","Naturalised citizens of Austria","Naturalised citizens of Switzerland","New Jersey socialists","Nobel laureates in Physics","Patent examiners","People from Berlin","People from Bern","People from Munich","People from Princeton, New Jersey","People from Ulm","People from Zürich","People who lost German citizenship","People with acquired American citizenship","Philosophers of science","Relativity theorists","Stateless people","Swiss agnostics","Swiss emigrants to the United States","Swiss Jews","Swiss physicists","Theoretical physicists","Winners of the Max Planck Medal","World federalists","Recipients of the Pour le Mérite (civil class)","Determinists","Activists from New Jersey","Mathematicians involved with Mathematische Annalen","Intellectual Cooperation","Disease-related deaths in New Jersey"],
"wgBreakFrames":!1,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRelevantPageName":"Albert_Einstein","wgRelevantArticleId":736,"wgRequestId":"XaChjApAICIAALSsYfgAAABV","wgCSPNonce":!1,"wgIsProbablyEditable":!1,"wgRelevantPageIsProbablyEditable":!1,"wgRestrictionEdit":["autoconfirmed"],"wgRestrictionMove":["sysop"],"wgMediaViewerOnClick":!0,"wgMediaViewerEnabledByDefault":!0,"wgPopupsReferencePreviews":!1,"wgPopupsConflictsWithNavPopupGadget":!1,"wgVisualEditor":{"pageLanguageCode":"en","pageLanguageDir":"ltr","pageVariantFallbacks":"en"},"wgMFDisplayWikibaseDescriptions":{"search":!0,"nearby":!0,"watchlist":!0,"tagline":
!1},"wgWMESchemaEditAttemptStepOversample":!1,"wgULSCurrentAutonym":"English","wgNoticeProject":"wikipedia","wgWikibaseItemId":"Q937","wgCentralAuthMobileDomain":!1,"wgEditSubmitButtonLabelPublish":!0};RLSTATE={"ext.globalCssJs.user.styles":"ready","site.styles":"ready","noscript":"ready","user.styles":"ready","ext.globalCssJs.user":"ready","user":"ready","user.options":"ready","user.tokens":"loading","ext.cite.styles":"ready","ext.math.styles":"ready","mediawiki.legacy.shared":"ready","mediawiki.legacy.commonPrint":"ready","jquery.makeCollapsible.styles":"ready","mediawiki.toc.styles":"ready","wikibase.client.init":"ready","ext.visualEditor.desktopArticleTarget.noscript":"ready","ext.uls.interlanguage":"ready","ext.wikimediaBadges":"ready","ext.3d.styles":"ready","mediawiki.skinning.interface":"ready","skins.vector.styles":"ready"};RLPAGEMODULES=["ext.cite.ux-enhancements","ext.cite.tracking","ext.math.scripts","ext.scribunto.logs","site","mediawiki.page.startup",
"mediawiki.page.ready","jquery.makeCollapsible","mediawiki.toc","mediawiki.searchSuggest","ext.gadget.teahouse","ext.gadget.ReferenceTooltips","ext.gadget.watchlist-notice","ext.gadget.DRN-wizard","ext.gadget.charinsert","ext.gadget.refToolbar","ext.gadget.extra-toolbar-buttons","ext.gadget.switcher","ext.centralauth.centralautologin","mmv.head","mmv.bootstrap.autostart","ext.popups","ext.visualEditor.desktopArticleTarget.init","ext.visualEditor.targetLoader","ext.eventLogging","ext.wikimediaEvents","ext.navigationTiming","ext.uls.compactlinks","ext.uls.interface","ext.cx.eventlogging.campaigns","ext.quicksurveys.init","ext.centralNotice.geoIP","ext.centralNotice.startUp","skins.vector.js"];
(RLQ=window.RLQ||[]).push(function(){mw.loader.implement("user.tokens@tffin",function($,jQuery,require,module){/*@nomin*/mw.user.tokens.set({"patrolToken":"+\","watchToken":"+\","csrfToken":"+\"});
});});
Albert Einstein
摘自维基百科,免费百科全书
跳转到导航跳转到搜索“爱因斯坦”,将重定向到此处。对于其他人们,请看爱因斯坦(姓氏)。其他用途,请参见爱因斯坦(消除歧义)和爱因斯坦(消除歧义)。
德国出生的物理学家和相对论的开发者
[Albert EinsteinEinstein in 1921Born(1879-03-14)1879年3月14日德国帝国符腾堡王国1955年4月18日去世(1955-04-18)(76岁)美国新泽西州普林斯顿居住德国,意大利,瑞士,奥地利(今天的捷克共和国),比利时,美国符腾堡州在此期间的公民身份德意志帝国(1879–1896)[注1]无国籍(1896-1901)的公民瑞士(1901-1955年)奥匈帝国的奥地利题材帝国(1911-1912)德国期间的普鲁士王国主题帝国(1914–1918)[注1]普鲁士自由州的德国公民(魏玛共和国,1918–1933年)美国公民(1940年至1955年)教育联邦理工学校(1896-1900;学士,1900)苏黎世大学(博士学位,1905年)以广义相对论着称狭义相对论光电效应E = mc2(质量-能量等价)E = hf(普朗克-爱因斯坦关系)布朗运动理论爱因斯坦场方程Bose-Einstein统计Bose-Einstein凝结引力波宇宙常数统一场理论EPR悖论合奏解释其他概念列表配偶米雷娃·玛丽(MilevaMarić)(1903年; 1919年除法)艾尔莎·洛文塔(ElsaLöwenthal)(1919年;逝世[1] [2]。1936)儿童“ Lieserl”爱因斯坦·汉斯(Einstein Hans)爱因斯坦·爱德华(Eduard)“泰特”爱因斯坦奖巴纳德奖(1920)诺贝尔物理学奖(1921)Matteucci勋章(1921)ForMemRS(1921)[3] Copley勋章(1925)[3]皇家天文学会金奖(1926)Max普朗克奖章(1929)美国国家科学院院士(1942)《世纪时报》(1999年)科学职业领域物理学,哲学机构瑞士专利局(伯尔尼)(1902-1909年)伯尔尼大学(1908–1909)苏黎世大学(1909–1911)布拉格的查尔斯大学(1911-1912)苏黎世联邦理工学院(1912-1914)普鲁士科学院(1914–1933)柏林洪堡大学(1914–1933年)威廉皇帝学院(Kaiser Wilhelm Institute)(导演,1917-1933年)德语物理学会(1916-1918年主席)莱顿大学(参观,1920)高级研究所(1933-1955)加州理工学院(访问,1931–1933年)牛津大学(访问,1931-1933年)论文Bestimmung derMoleküldimensionen(分子的新测定尺寸)(1905年)医生顾问Alfred Kleiner其他学术顾问海因里希·弗里德里希·韦伯影响亚瑟·叔本华·巴鲁克Spinoza Bernhard Riemann大卫·休姆·恩斯特·马赫·亨德里克·洛伦兹·赫尔曼Minkowski Isaac Newton James秘书麦克斯韦·米歇尔·贝索·莫里茨Schlick Thomas Young实际上影响了所有现代物理学
[Albert Einstein签名(/ ˈaɪnstaɪn / EYEN-styne; [4]德语:[ˈalbɛʁtˈʔaɪnʃtaɪn](听); 1879年3月14日至1955年4月18日)是德国人发展相对论的理论物理学家[5]现代物理学的两个支柱技师)。[3] [6]:274他的作品还因其对科学哲学。[7] [8]他是最广为人知的他的质量-能量当量公式。 。 。 。 。
我只想要第一个标题“爱因斯坦”之后的文本
首先找到h1标签,然后使用find_next_siblings('div')
并打印文本值。
以上是关于如何在python中的id =“ firstheading”之后抓取网页上的所有信息?的主要内容,如果未能解决你的问题,请参考以下文章
如何在 bs4 [python 3] 中的另一个标签内从没有类或 id 的标签中刮取 url