美丽的汤，get_text 但不是 <span> 文本.. 我怎样才能得到它？

Posted 2023-02-24

技术标签:

【中文标题】美丽的汤，get_text 但不是 <span> 文本.. 我怎样才能得到它？【英文标题】：Beautiful Soup, get_text but NOT the <span> text.. How can i get it? 【发布时间】：2021-01-27 22:06:54 【问题描述】：

鉴于此标记： [标记][1]

我需要在一列中获取数字 182，在另一列中获取 58。我已经有了跨度，但是当我调用 div.get_tex() 或字符串时，它返回 = 18258（两个数字）

这是我的代码_：

prices= soup.find_all('div', class_='grilla-producto-precio')

cents= []
price= []
for px in prices:
    ### here i need to get the number 182 and append it to "price"
    for spn in px.find('span'):
        cents.append(spn)

如果没有跨度，我如何单独获得 182 的价格？谢谢！！！！ [1]：https://i.stack.imgur.com/ld9qo.png

【问题讨论】：

cents.append(spn.text)? 有人帮助了我，但评论被删除了.. 是 div.find_next(text=True) 是的，是我。看看我的回答。只有text=true才能得到，不需要拆分替换如果你使用拆分和替换，那么你可以得到你想要的数字。另外，您还可以将其作为 int 获取。 【参考方案1】：

您问题的答案与this question的答案几乎相同。

from bs4 import BeautifulSoup

html = """
<div class = "grilla-producto-precio">
" $"
"182"
<span>58</span>
</div>
"""
soup = BeautifulSoup(html,'html5lib')

prices = soup.find_all('div',class_ = "grilla-producto-precio")

cents = []

for px in prices:
    txt = px.find_next(text=True).strip()

    txt = txt.replace('"','')

    txt = int(txt.split("\n")[-1])
    
    cents.append(txt)

输出：

[182]

【讨论】：

这似乎是一个重复的答案:) 你可以将它们与标记重复链接吗？ +1 当然。但在此之前，这个问题是完全重复的吗？它只是与我提到的链接有关。但无论如何，如果你觉得它是重复的，那么我会将它标记为重复。有经验的人永远是对的！不，有问题。我已经用更简单和强大的方式更新了它。如果传入的数据不是数字，int() 将失败。不，这不是他/她想要的跨度。我会编辑它【参考方案2】：

另一种解决方案是检查字符串isdigit():

from bs4 import BeautifulSoup

txt = """
<div class = "grilla-producto-precio">
" $"
"182"
<span>58</span>
</div>
"""
soup = BeautifulSoup(txt, "html.parser")

data = soup.find("div", class_="grilla-producto-precio").next
price = [int("".join(d for d in data if d.isdigit()))]

print(price) # Output: [182]

【讨论】：

以上是关于美丽的汤，get_text 但不是 <span> 文本.. 我怎样才能得到它？的主要内容，如果未能解决你的问题，请参考以下文章