如何从 r 中的 api 获取数据？

Posted 2023-03-31

技术标签:

【中文标题】如何从 r 中的 api 获取数据？【英文标题】：How to get data from an api in r? 【发布时间】：2021-08-13 17:30:34 【问题描述】：

我是api 的新手，在python 中遇到了一小段代码检索数据，我想在r 中复制它：强>

python 代码：

import requests
import json
from datetime import date
import time
import smtplib, ssl

#API URL
url = 'http://cdn-api.co-vin.in/api/v2/admin/location/states'
headers = 'accept': 'application/json','Accept-Language' : 'hi_IN','User-Agent': 'Mozilla/4.0'
result = requests.get(url, headers=headers)
#Print state ID
print(result.content.decode())

结果：

"states":["state_id":1,"state_name":"安达曼和尼科巴群岛","state_id":2,"state_name":"安得拉邦","state_id":3,"state_name":"阿鲁纳恰尔 Pradesh","state_id":4,"state_name":"Assam","state_id":5,"state_name":"Bihar","state_id":6,"state_name":"Chandigarh" ,"state_id":7,"state_name":"Chhattisgarh","state_id":8,"state_name":"Dadra 和 Nagar Haveli","state_id":37,"state_name":"达曼和第乌","state_id":9,"state_name":"德里","state_id":10,"state_name":"果阿","state_id":11,"state_name":"古吉拉特" ,"state_id":12,"state_name":"哈里亚纳邦","state_id":13,"state_name":"喜马偕尔邦","state_id":14,"state_name":"查谟和孟加拉语"],"ttl":24

API 信息：

网址：'http://cdn-api.co-vin.in/api/v2/admin/location/states'

来自：https://apisetu.gov.in/public/marketplace/api/cowin#/Metadata%20APIs/states

来自：https://github.com/cowinapi/developer.cowin/issues/339

(注意： cowin API 仅限于印度访问。所以我猜你们中的许多人将无法使用它。但如果您能建议一些代码更改，它仍然会有所帮助。）

R

我已经在谷歌上搜索并尝试过下面的代码片段，但到目前为止没有一个起作用：

library(tidyverse)
library(rjson)
library(jsonlite)
library(RCurl)
library(httr)

states_url = 'http://cdn-api.co-vin.in/api/v2/admin/location/states'

headers = c('accept' = 'application/json',
            'Accept-Language' = 'hi_IN',
            'User-Agent' = 'Mozilla/4.0')

url(states_url, headers = headers)

GET(states_url)$content

GET(states_url)$headers

更新

我已经尝试过了，它没有给出错误，但不知道下一步该做什么：

states_url = 'http://cdn-api.co-vin.in/api/v2/admin/location/states'

headers = c('accept' = 'application/json',
            'Accept-Language' = 'hi_IN',
            'User-Agent' = 'Mozilla/4.0')

url(states_url, headers = headers)

与的联系描述“http://cdn-api.co-vin.in/api/v2/admin/location/states” 类“url-wininet” 模式“r” 文本“文本” 打开“关闭” 可以读“是” 可以写“不”

GET(states_url, header = headers)$content

1 3c 21 44 4f 43 54 59 50 45 20 48 54 4d 4c 20 50 55 42 4c [20] 49 43 20 22 2d 2f 2f 57 33 43 2f 2f 44 54 44 29] 48 4c 20 34 2e 30 31 20 54 72 61 6e 73 69 74 69 6f 6e 61 6c [58] 2f 2f 45 4e 22 20 22 68 74 74 70 3a 2f 2f 77 77 77 2e 77 [77] 33 2e 6f 72 67 2f 54 52 2f 68 74 6d 6c 34 2f 6c 6f 6f 73 [96] 65 2e 64 74 64 22 3e 0a 3c 48 54 4d 4c 3e 3c 48 45 41 44 [115] 3e 3c 4d 45 54 41 20 48 54 54 50 2d 45 51 55 49 56 3d 22 [134] 43 6f 6e 74 65 6e 74 2d 54 79 70 65 22 20 43 4f 4e 54 45 [153] 4e 54 3d 22 74 65 78 74 2f 68 74 6d 6c 3b 20 63 68 61 7

str(GET(states_url))

List of 10
 $ url        : chr "http://cdn-api.co-vin.in/api/v2/admin/location/states"
 $ status_code: int 403
 $ headers    :List of 9
  ..$ server        : chr "CloudFront"
  ..$ date          : chr "Tue, 25 May 2021 12:39:02 GMT"
  ..$ content-type  : chr "text/html"
  ..$ content-length: chr "919"
  ..$ connection    : chr "keep-alive"
  ..$ x-cache       : chr "Error from cloudfront"
  ..$ via           : chr "1.1 85ad220378d99bdabeb6c46016f1cf16.cloudfront.net (CloudFront)"
  ..$ x-amz-cf-pop  : chr "BOM51-C1"
  ..$ x-amz-cf-id   : chr "eeJq5ZtSJHZkLGoJZBTUL2xL5PcU2gjesnY7Qmg_kMnxZxZ1JUHPWA=="
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 $ all_headers:List of 1
  ..$ :List of 3
  .. ..$ status : int 403
  .. ..$ version: chr "HTTP/1.1"
  .. ..$ headers:List of 9
  .. .. ..$ server        : chr "CloudFront"
  .. .. ..$ date          : chr "Tue, 25 May 2021 12:39:02 GMT"
  .. .. ..$ content-type  : chr "text/html"
  .. .. ..$ content-length: chr "919"
  .. .. ..$ connection    : chr "keep-alive"
  .. .. ..$ x-cache       : chr "Error from cloudfront"
  .. .. ..$ via           : chr "1.1 85ad220378d99bdabeb6c46016f1cf16.cloudfront.net (CloudFront)"
  .. .. ..$ x-amz-cf-pop  : chr "BOM51-C1"
  .. .. ..$ x-amz-cf-id   : chr "eeJq5ZtSJHZkLGoJZBTUL2xL5PcU2gjesnY7Qmg_kMnxZxZ1JUHPWA=="
  .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 $ cookies    :'data.frame':    0 obs. of  7 variables:
  ..$ domain    : logi(0) 
  ..$ flag      : logi(0) 
  ..$ path      : logi(0) 
  ..$ secure    : logi(0) 
  ..$ expiration: 'POSIXct' num(0) 
  ..$ name      : logi(0) 
  ..$ value     : logi(0) 
 $ content    : raw [1:919] 3c 21 44 4f ...
 $ date       : POSIXct[1:1], format: "2021-05-25 12:39:02"
 $ times      : Named num [1:6] 0 0.242 0.283 0.284 0.321 ...
  ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
 $ request    :List of 7
  ..$ method    : chr "GET"
  ..$ url       : chr "http://cdn-api.co-vin.in/api/v2/admin/location/states"
  ..$ headers   : Named chr "application/json, text/xml, application/xml, */*"
  .. ..- attr(*, "names")= chr "Accept"
  ..$ fields    : NULL
  ..$ options   :List of 2
  .. ..$ useragent: chr "libcurl/7.64.1 r-curl/4.3.1 httr/1.4.2"
  .. ..$ httpget  : logi TRUE
  ..$ auth_token: NULL
  ..$ output    : list()
  .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
  ..- attr(*, "class")= chr "request"
 $ handle     :Class 'curl_handle' <externalptr> 
 - attr(*, "class")= chr "response"
Show in New Window

http_status(GET(states_url))

$category
[1] "Client error"

$reason
[1] "Forbidden"

$message
[1] "Client error: (403) Forbidden"

stringi::stri_enc_detect(GET(states_url, header = headers)$content)

[[1]]
     Encoding Language Confidence
1  ISO-8859-1       en       0.54
2  ISO-8859-2       ro       0.26
3       UTF-8                0.15
4    UTF-16BE                0.10
5    UTF-16LE                0.10
6   Shift_JIS       ja       0.10
7     GB18030       zh       0.10
8      EUC-JP       ja       0.10
9      EUC-KR       ko       0.10
10       Big5       zh       0.10
11 ISO-8859-9       tr       0.06
12 IBM424_rtl       he       0.02
13 IBM424_ltr       he       0.01

content(GET(states_url, header = headers), encoding = "UTF-8")

html_document
<html>
[1] <head>\n<meta http-equiv="Content-Type" content="text/htm ...
[2] <body>\n<h1>403 ERROR</h1>\n<h2>The request could not be  ...

content(GET(states_url, header = headers), encoding = "ISO-8859-1")

html_document
<html>
[1] <head>\n<meta http-equiv="Content-Type" content="text/htm ...
[2] <body>\n<h1>403 ERROR</h1>\n<h2>The request could not be  ...

python代码图片：

【问题讨论】：

在您的原始输出上尝试rawToChar (GET(..)$content)。通过这样做我得到

[1] "&lt;!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"&gt;\n&lt;HTML&gt;&lt;HEAD&gt;&lt;META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=iso-8859-1\"&gt;\n&lt;TITLE&gt;ERROR: The request could not be satisfied&lt;/TITLE&gt;\n&lt;/HEAD&gt;&lt;BODY&gt;\n&lt;H1&gt;403 ERROR&lt;/H1&gt;\n&lt;H2&gt;The request could not be satisfied.&lt;/H2&gt;\n&lt;HR noshade size=\"1px\"&gt;\nRequest blocked.\nWe can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.\n&lt; ......

@r2evans 我之前在状态代码中遇到了403 错误，我不确定为什么在 python 中正常工作时会出现此错误。请参阅my other comment，了解有关 OTP 的建议需求。您的另一个问题中的代码缺少该组件，我只能想象python代码可以工作，因为您以前以某种方式包含了它（或者python的requests已经从其他地方找到或缓存了它）。您调用包含标题的url(states_url, headers = headers)，但这对于后续的GET(.) 没有任何作用，您在其中不使用标题。你打电话给url(.) 是完全没有用的，除了让你确信那里有东西。 url(.) 调用中的任何内容都不会持续存在或在其他任何地方使用。删除您对url(.) 的呼叫，然后执行GET(states_url, add_headers(headers))。 【参考方案1】：

您拨打headers，但从不将它们包含在您对GET 的调用中。在那里使用它们。

GET(states_url, add_headers(headers))

【讨论】：

以上是关于如何从 r 中的 api 获取数据？的主要内容，如果未能解决你的问题，请参考以下文章