Scrapy Scrape元素未知数量

Question

我想在Shopee上搜索一个网站列表。一些例子包括dudesgadget和2ubest。这些shopee商店中的每一个都有不同的设计和构建其web元素和不同领域的方式。它们看起来像独立的网站但实际上并非如此。

所以这里的主要问题是我试图抓住产品细节。我将总结一些不同的结构：

2ubest

<html>
    <body>
        <div id="shopify-section-announcement-bar" id="shopify-section-announcement-bar">
            <main class="wrapper main-content" role="main">
                <div class="grid">
                    <div class="grid__item">
                        <div id="shopify-section-product-template" class="shopify-section">
                            <script id="ProductJson-product-template" type="application/json">
                                //Things I am looking for
                            </script>
                        </div>
                    </div>
                </div>
            </main>
        </div>
    </body>
</html>

littleplayland

<html>
    <body id="adjustable-ergonomic-laptop-stand" class="template-product">
        <script>
            //Things I am looking for
        </script>
    </body>
</html>

还有其他一些，我发现它们之间存在一种模式。

我正在寻找的东西肯定会在<body>
我正在寻找的东西是在<script>内
我唯一不确定的是从<body>到<script>的距离

我的解决方案是：

def parse(self, response):
    body = response.xpath("//body")
    for script in body.xpath("//script/text()").extract():
        #Manipulate the script with js2xml here

我能够提取littleplayland，dailysteals和许多其他距离<body>到<script>的距离非常小，但不适用于2ubest，其中有很多其他html元素介于我正在寻找的东西之间。我能否知道是否有解决方案可以忽略其间的所有html元素并且只查找<script>标签？

我需要一个通用的解决方案，如果可能的话，可以在所有Shopee网站上运行，因为它们都具有我上面提到的特征。

这意味着该解决方案不应使用<div>进行过滤，因为每个不同的网站都有不同数量的<div>

Answer 1

另一答案