如何废弃cricinfo中的所有测试匹配详细信息

Question

我试图废弃所有的测试比赛详细信息，但它显示HTTP Error 504: Gateway Timeout我得到测试匹配的详细信息，但它没有显示这是我的代码我已经使用bs4从cricinfo中删除测试匹配详细信息

我需要废弃2000测试匹配的详细信息，这是我的代码

import urllib.request as req

BASE_URL = 'http://www.espncricinfo.com'

if not os.path.exists('./espncricinfo-fc'):
    os.mkdir('./espncricinfo-fc')

for i in range(0, 2000):

    soupy = BeautifulSoup(urllib2.urlopen('http://search.espncricinfo.com/ci/content/match/search.html?search=test;all=1;page=' + str(i)).read())

    time.sleep(1)
    for new_host in soupy.findAll('a', {'class' : 'srchPlyrNmTxt'}):
        try:
            new_host = new_host['href']
        except:
            continue
        odiurl =BASE_URL + urljoin(BASE_URL,new_host)
        new_host = unicodedata.normalize('NFKD', new_host).encode('ascii','ignore')
        print(new_host)
        html = req.urlopen(odiurl).read()
        if html:
            with open('espncricinfo-fc/{0!s}'.format(str.split(new_host, "/")[4]), "wb") as f:
                f.write(html)
                print(html)
        else:
            print("no html")

Answer 1

另一答案