如何让 pandas.read_json 将此 API 返回识别为有效的 .json?

Posted

技术标签:

【中文标题】如何让 pandas.read_json 将此 API 返回识别为有效的 .json?【英文标题】:How do I get pandas.read_json to recognize this API return as a valid .json? 【发布时间】:2021-09-27 13:30:48 【问题描述】:

我正在使用 Mapbox API 从使用坐标的区域获取更多信息。 API 调用返回一个 web .json,我无法在其中让 pandas 使用

将其存储为数据框

pandas.read_json

https://pandas.pydata.org/pandas-docs/version/0.25.3/reference/api/pandas.read_json.html

API 请求返回一个 web .json,这里是一个返回 .json 的例子。

"type":"FeatureCollection","query":[-73.989,40.733],"features":["id":"address.5528394502635160","type":"Feature","place_type":["address"],"relevance":1,"properties":"accuracy":"rooftop","text":"East 13th Street","place_name":"120 East 13th Street, New York, New York 10003, United States","center":[-73.98893045,40.73295105],"geometry":"type":"Point","coordinates":[-73.98893045,40.73295105],"address":"120","context":["id":"neighborhood.2103290","text":"Greenwich Village","id":"postcode.13482670360296810","text":"10003","id":"locality.12696928000137850","wikidata":"Q11299","text":"Manhattan","id":"place.2618194975964500","wikidata":"Q60","text":"New York","id":"district.12113562209855570","wikidata":"Q500416","text":"New York County","id":"region.17349986251855570","wikidata":"Q1384","short_code":"US-NY","text":"New York","id":"country.19678805456372290","wikidata":"Q30","short_code":"us","text":"United States"],"id":"neighborhood.2103290","type":"Feature","place_type":["neighborhood"],"relevance":1,"properties":,"text":"Greenwich Village","place_name":"Greenwich Village, New York, New York 10003, United States","bbox":[-74.005282,40.72586,-73.98734,40.73907],"center":[-74.0029,40.7284],"geometry":"type":"Point","coordinates":[-74.0029,40.7284],"context":["id":"postcode.13482670360296810","text":"10003","id":"locality.12696928000137850","wikidata":"Q11299","text":"Manhattan","id":"place.2618194975964500","wikidata":"Q60","text":"New York","id":"district.12113562209855570","wikidata":"Q500416","text":"New York County","id":"region.17349986251855570","wikidata":"Q1384","short_code":"US-NY","text":"New York","id":"country.19678805456372290","wikidata":"Q30","short_code":"us","text":"United States"],"id":"postcode.13482670360296810","type":"Feature","place_type":["postcode"],"relevance":1,"properties":,"text":"10003","place_name":"New York, New York 10003, United States","bbox":[-73.9996058238451,40.7229310019,-73.9798620096375,40.7396749960342],"center":[-73.99,40.73],"geometry":"type":"Point","coordinates":[-73.99,40.73],"context":["id":"locality.12696928000137850","wikidata":"Q11299","text":"Manhattan","id":"place.2618194975964500","wikidata":"Q60","text":"New York","id":"district.12113562209855570","wikidata":"Q500416","text":"New York County","id":"region.17349986251855570","wikidata":"Q1384","short_code":"US-NY","text":"New York","id":"country.19678805456372290","wikidata":"Q30","short_code":"us","text":"United States"],"id":"locality.12696928000137850","type":"Feature","place_type":["locality"],"relevance":1,"properties":"wikidata":"Q11299","text":"Manhattan","place_name":"Manhattan, New York, United States","bbox":[-74.047313153061,40.679573,-73.907,40.8820749648427],"center":[-73.9597,40.7903],"geometry":"type":"Point","coordinates":[-73.9597,40.7903],"context":["id":"place.2618194975964500","wikidata":"Q60","text":"New York","id":"district.12113562209855570","wikidata":"Q500416","text":"New York County","id":"region.17349986251855570","wikidata":"Q1384","short_code":"US-NY","text":"New York","id":"country.19678805456372290","wikidata":"Q30","short_code":"us","text":"United States"],"id":"place.2618194975964500","type":"Feature","place_type":["place"],"relevance":1,"properties":"wikidata":"Q60","text":"New York","place_name":"New York, New York, United States","bbox":[-74.25909,40.477399,-73.700272,40.917577],"center":[-73.9866,40.7306],"geometry":"type":"Point","coordinates":[-73.9866,40.7306],"context":["id":"district.12113562209855570","wikidata":"Q500416","text":"New York County","id":"region.17349986251855570","wikidata":"Q1384","short_code":"US-NY","text":"New York","id":"country.19678805456372290","wikidata":"Q30","short_code":"us","text":"United States"],"id":"district.12113562209855570","type":"Feature","place_type":["district"],"relevance":1,"properties":"wikidata":"Q500416","text":"New York County","place_name":"New York County, New York, United States","bbox":[-74.047227,40.682932,-73.907,40.879278],"center":[-74,40.7167],"geometry":"type":"Point","coordinates":[-74,40.7167],"context":["id":"region.17349986251855570","wikidata":"Q1384","short_code":"US-NY","text":"New York","id":"country.19678805456372290","wikidata":"Q30","short_code":"us","text":"United States"],"id":"region.17349986251855570","type":"Feature","place_type":["region"],"relevance":1,"properties":"wikidata":"Q1384","short_code":"US-NY","text":"New York","place_name":"New York, United States","bbox":[-79.8578350999901,40.4771391062446,-71.7564918092633,45.0239286969073],"center":[-75.4652471468304,42.751210955],"geometry":"type":"Point","coordinates":[-75.4652471468304,42.751210955],"context":["id":"country.19678805456372290","wikidata":"Q30","short_code":"us","text":"United States"],"id":"country.19678805456372290","type":"Feature","place_type":["country"],"relevance":1,"properties":"wikidata":"Q30","short_code":"us","text":"United States","place_name":"United States","bbox":[-179.9,18.8163608007951,-66.8847646185949,71.4202919997506],"center":[-97.9222112121185,39.3812661305678],"geometry":"type":"Point","coordinates":[-97.9222112121185,39.3812661305678]],"attribution":"NOTICE: © 2021 Mapbox and its suppliers. All rights reserved. Use of this data is subject to the Mapbox Terms of Service (https://www.mapbox.com/about/maps/). This response and the information it contains may not be retained. POI(s) provided by Foursquare."

这是我的代码:

url = "https://api.mapbox.com/geocoding/v5/mapbox.places/-73.989,40.733.json?access_token=MY_KEY_HERE"

df = pd.read_json(url, orient='split')

return df

我尝试过orient = 'split', 'index', 'records', 'columns', and 'values',但大多数时候它返回:“ValueError: arrays must be all the length”。我需要做什么才能让 pandas.read_json 将此 API 返回识别为有效的 .json?

输出:ValueError:数组的长度必须相同

期望:返回的 .json 被读取并存储到 pandas 数据帧中

【问题讨论】:

Pandas 不能接受任何随机的 json 输入。甚至必须将数据表示为表格,这对于嵌套字段的 json blob 显然是不可能的。您必须使用自己的代码构建数据框 嗨,Mikael,您有更多信息/示例说明您的意思吗?如果我尝试导入 .json,我不太清楚你必须用我自己的代码构建数据框是什么意思。 您必须准确决定您想要在数据框中包含哪些列。然后,您需要编写自定义代码,从结果中提取这些信息。你会想要像***.com/questions/6386308/… 那样做那部分,然后编写你自己的循环来提取你想要的信息并将其添加到你的数据框 该链接是一个有用的资源,让我试试。谢谢。 【参考方案1】:

Pandas 提供了一个实用函数pd.json_normalize(),将半结构化数据规范化为一个平面表。对于您的 json 响应,这应该有效:

import pandas as pd

data = <your JSON response>

df = pd.json_normalize(data, 'features')

您可以在pandas user guide 的最新版本中找到示例。 出于某种原因,我在最新版本的 pandas API 参考中找不到专门的页面。不过older versions里有提到,这里也可以找到参数列表。

【讨论】:

非常感谢!这解决了问题!感谢您提供文档和示例。我现在明白为什么你有'功能'参数。你是一个救生员。

以上是关于如何让 pandas.read_json 将此 API 返回识别为有效的 .json?的主要内容,如果未能解决你的问题,请参考以下文章

使用 pandas.read_json 时出现 ValueError

如何使用 pandas read_json 读取 ADSB json 数据? [复制]

在 Pandas UnicodeDecodeError 中无法使用 pandas.read_json() 解码 JSON 文件中的 Unicode Ascii

在 pandas 中使用 read_json 导入单个记录

Pandas.read_json(JSON_URL)

Pandas Series 写入和读取 json 数据会产生带有 to_json 和 read_json 的 ValueError [重复]