python简单爬数据

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python简单爬数据相关的知识,希望对你有一定的参考价值。

失败了,即使跟Firefox看到的headers,参数一模一样都不行,爬出来有网页,但是就是不给数据,尝试禁用了js,然后看到了cookie(不禁用js是没有cookie的),用这个cookie爬,还是不行,隔了时间再看,cookie的内容也并没有变化,有点受挫,但还是发出来,也算给自己留个小任务啥的

如果有大佬经过,还望不吝赐教

另外另两个网站的脚本都可以用,过会直接放下代码,过程就不说了


 

目标网站 http://www.geomag.bgs.ac.uk/data_service/models_compass/igrf_form.shtml

先解决一下date到decimal years的转换,仅考虑到天的粗略转换

def date2dy(year, month, day):
    months = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
    oneyear = 365
    if year%100 == 0:
        if year%400 == 0:
            months[1] = 29
            oneyear = 366
    else:
        if year%4 == 0:
            months[1] = 29
            oneyear = 366

    days = 0
    i = 1
    while i < month:
        days = days + months[i]
        i = i + 1
    days = days + day - 1
    return year + days/366

第一个小目标是抓下2016.12.1的数据

打开FireFox的F12,调到网络一栏

技术分享

提交数据得到

技术分享

有用的信息是请求头,请求网址和参数,扒下来扔到程序里面试试

这块我试了大概一天多,抓不下来,我好菜呀.jpg

放下代码吧先,万一有大佬经过还望不吝赐教

#!usr/bin/python

import requests
import sys

web_url = rhttp://www.geomag.bgs.ac.uk/data_service/models_compass/igrf_form.shtml
request_url = rhttp://www.geomag.bgs.ac.uk/cgi-bin/igrfsynth
filepath = sys.path[0] + \\\\data_igrf_raw_ + .html
fid = open(filepath, w, encoding=utf-8)
headers = {
    Host: www.geomag.bgs.ac.uk,
    User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:53.0) Gecko/20100101 Firefox/53.0,
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,
    Accept-Language: zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3,
    Accept-Encoding: gzip, deflate,
    Content-Type: application/x-www-form-urlencoded,
    Content-Length: 136,
    Referer: http://www.geomag.bgs.ac.uk/data_service/models_compass/igrf_form.shtml,
    Connection: keep-alive,
    Upgrade-Insecure-Requests: 1
}
payload = {
    name: -,  # your name and email address
    coord: 1,  # ‘1‘: Geodetic ‘2‘: Geocentic
    date: 2016.92,  # decimal years
    alt: 150,  # Altitude
    place: ‘‘,
    degmin: y,  # Position Coordinates: ‘y‘: In Degrees and Minutes ‘n‘: In Decimal Degrees
    latd: 60,  # latitude degrees (degrees negative for south)
    latm: 0,  # latitude minutes
    lond: 120,  # longitude degrees (degrees negative for west)
    lonm: 0,  # longitude minutes
    tot: y,  # Total Intensity(F)
    dec: y,  # Declination(D)
    inc: y,  # Inclination(I)
    hor: y,  # Horizontal Intensity(H)
    nor: y,  # North Component (X)
    eas: y,  # East Component (Y)
    ver: y,  # Vertical Component (Z)
    map: 0,  # Include a Map of the Location: ‘0‘: NO ‘1‘: YES
    sv: n
}
#如果需要Secular Variation (rate of change), 加上‘sv‘: ‘y‘
r = requests.post(request_url, data=payload, headers=headers)
fid.write(r.text)
fid.close();

 

以上是关于python简单爬数据的主要内容,如果未能解决你的问题,请参考以下文章

python简单爬数据

python简单爬数据(这两个成功了)

Python如何简单爬取腾讯新闻网前五页文字内容?

如何用30行代码爬取Google Play 100万个App的数据

如何用30行代码爬取Google Play 100万个App的数据

scrapy主动退出爬虫的代码片段(python3)