在python中将html图形元素转换为sql
Posted
技术标签:
【中文标题】在python中将html图形元素转换为sql【英文标题】:Converting html figure element to sql in python 【发布时间】:2021-02-07 02:19:26 【问题描述】:我正在尝试构建一个机器人来抓取带有链接列表的记事本并检索表格
import requests
from bs4 import BeautifulSoup
import re
link="https://almashhadalsudani.com/economic-news/currency-prices-sudan/29061/"
def info_grabber(link):
try:
source=requests.get(f'link')
except:
print("Unable to connect to GET service")
return 'Error': '101'
soup = BeautifulSoup(source.text, "html.parser")
table_data=soup.find("figure","wp-block-table")
prettyHTML = table_data.prettify()
print(prettyHTML)
info_grabber(link)
#print(values)
然后应该将返回的表格元素添加到数据库中,并且应该重复该过程直到记事本中的链接列表结束。
到目前为止,此代码输出。
<figure class="wp-block-table">
<table>
<tbody>
<tr>
<th>
<strong>
العملة
</strong>
</th>
<th>
سعر الصرف
</th>
</tr>
<tr>
<td>
دولار امريكي
</td>
<td>
370 جنيه
</td>
</tr>
<tr>
<td>
ريال سعودي
</td>
<td>
95 جنيه
</td>
</tr>
<tr>
<td>
درهم اماراتي
</td>
<td>
97 جنيه
</td>
</tr>
<tr>
<td>
<strong>
سعر الدولار الرسمي
</strong>
</td>
<td>
55.0000 جنيه
</td>
</tr>
<tr>
<td>
اليورو
</td>
<td>
435 جنيه
</td>
</tr>
<tr>
<td>
الجنيه الاسترليني
</td>
<td>
455 جنيه
</td>
</tr>
<tr>
<td>
ريال قطري
</td>
<td>
96 جنيه
</td>
</tr>
<tr>
<td>
الجنيه المصري
</td>
<td>
23 جنيه
</td>
</tr>
</tbody>
</table>
</figure>
***Repl Closed***
有没有办法将此答案转换为数据库有效条目,以便我可以研究这些数据或以后使用它?
【问题讨论】:
【参考方案1】:试试这个。
from simplified_scrapy import Spider, SimplifiedDoc, SimplifiedMain
from simplified_scrapy.core.mysql_objstore import MysqlObjStore
import json
class CustomStore(MysqlObjStore):
def saveObj(self, data):
conn = None
cur = None
try:
conn = self.connect()
cur = conn.cursor()
try:
cur.execute(
"insert into test(json) values(%s)",
(json.dumps(data), ))
return conn.commit()
except Exception as err:
conn.rollback()
print(err)
except Exception as err:
print(err)
finally:
if (cur): cur.close()
if (conn): conn.close()
class DemoSpider(Spider):
name = 'almashhadalsudani'
start_urls = [
'https://almashhadalsudani.com/economic-news/currency-prices-sudan/29061/'
]
# refresh_urls = True
# Storing Obj with mysql
obj_store = CustomStore(name,
'host': '127.0.0.1',
'port': 3306,
'user': 'root',
'pwd': 'root',
'dbName': 'test'
)
def extract(self, url, html, models, modelNames):
doc = SimplifiedDoc(html) # Parse
data = doc.select('figure.wp-block-table>').getTable()
return "Data": "table": data
SimplifiedMain.startThread(DemoSpider()) # Start
【讨论】:
Traceback(最近一次调用最后一次):文件“main.py”,第 2 行,在以上是关于在python中将html图形元素转换为sql的主要内容,如果未能解决你的问题,请参考以下文章
在 Python Pandas 中将带有元素日期标签的年度财政数据元组转换为时间序列