将导入的 json 数据导入数据框
Posted
技术标签:
【中文标题】将导入的 json 数据导入数据框【英文标题】:Getting imported json data into a data frame 【发布时间】:2013-06-01 14:36:27 【问题描述】:我有一个包含超过 1500 个 json 对象的文件,我想在 R 中使用这些对象。我已经能够将数据作为列表导入,但在将其强制转换为有用的结构时遇到了麻烦。我想创建一个数据框,其中包含每个 json 对象的行和每个键:值对的列。
我用这个小的假数据集重现了我的情况:
["name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null,
"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500,
"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null,
"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865,
"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221,
"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413,
"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902]
数据的一些特征:
所有对象都包含相同数量的键:值对,尽管 一些值为空 每个对象(名称和组)有两个非数字列 name是唯一标识,有10个左右的组 许多名称和组整体包含空格、逗号和其他标点符号。基于这个问题:R list(structure(list())) to data frame,我尝试了以下方法:
json_file <- "test.json"
json_data <- fromJSON(json_file)
asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame))
对于我的真实数据和这个假数据,最后一行给我这个错误:
Error in data.frame(name = "Doe, John", group = "Red", `age (y)` = 24, :
arguments imply differing number of rows: 1, 0
【问题讨论】:
【参考方案1】:您只需将 NULL 替换为 NA:
require(RJSONIO)
json_file <- '["name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null,
"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500,
"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null,
"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865,
"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221,
"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413,
"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902]'
json_file <- fromJSON(json_file)
json_file <- lapply(json_file, function(x)
x[sapply(x, is.null)] <- NA
unlist(x)
)
一旦每个元素都有一个非空值,就可以调用rbind
而不会出错:
do.call("rbind", json_file)
name group age (y) height (cm) wieght (kg) score
[1,] "Doe, John" "Red" "24" "182" "74.8" NA
[2,] "Doe, Jane" "Green" "30" "170" "70.1" "500"
[3,] "Smith, Joan" "Yellow" "41" "169" "60" NA
[4,] "Brown, Sam" "Green" "22" "183" "75" "865"
[5,] "Jones, Larry" "Green" "31" "178" "83.9" "221"
[6,] "Murray, Seth" "Red" "35" "172" "76.2" "413"
[7,] "Doe, Jane" "Yellow" "22" "164" "68" "902"
【讨论】:
我很惊讶没有更好的功能来做到这一点。 (对于 XML 有 XMLtoDataFrame 之类的函数)所以 JSONtoDataFrame 会很棒 @userJT - 有jsonlite::fromJSON
- 处理 NULL 并简化为 data.frame
。见my answer
这会将 json_file 转换为矩阵,而不是数据框。如何获取 data.frame?
@TSR: data.frame(do.call("rbind", json_file))
【参考方案2】:
如果您使用library(jsonlite)
或library(jsonify)
,这非常简单
它们都处理null
值并将它们转换为NA
,并保留数据类型。
数据
json_file <- '["name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null,
"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500,
"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null,
"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865,
"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221,
"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413,
"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902]'
jsonlite
library(jsonlite)
jsonlite::fromJSON( json_file )
# name group age (y) height (cm) wieght (kg) score
# 1 Doe, John Red 24 182 74.8 NA
# 2 Doe, Jane Green 30 170 70.1 500
# 3 Smith, Joan Yellow 41 169 60.0 NA
# 4 Brown, Sam Green 22 183 75.0 865
# 5 Jones, Larry Green 31 178 83.9 221
# 6 Murray, Seth Red 35 172 76.2 413
# 7 Doe, Jane Yellow 22 164 68.0 902
str( jsonlite::fromJSON( json_file ) )
# 'data.frame': 7 obs. of 6 variables:
# $ name : chr "Doe, John" "Doe, Jane" "Smith, Joan" "Brown, Sam" ...
# $ group : chr "Red" "Green" "Yellow" "Green" ...
# $ age (y) : int 24 30 41 22 31 35 22
# $ height (cm): int 182 170 169 183 178 172 164
# $ wieght (kg): num 74.8 70.1 60 75 83.9 76.2 68
# $ score : int NA 500 NA 865 221 413 902
json化
library(jsonify)
jsonify::from_json( json_file )
# name group age (y) height (cm) wieght (kg) score
# 1 Doe, John Red 24 182 74.8 NA
# 2 Doe, Jane Green 30 170 70.1 500
# 3 Smith, Joan Yellow 41 169 60.0 NA
# 4 Brown, Sam Green 22 183 75.0 865
# 5 Jones, Larry Green 31 178 83.9 221
# 6 Murray, Seth Red 35 172 76.2 413
# 7 Doe, Jane Yellow 22 164 68.0 90
str( jsonify::from_json( json_file ) )
# 'data.frame': 7 obs. of 6 variables:
# $ name : chr "Doe, John" "Doe, Jane" "Smith, Joan" "Brown, Sam" ...
# $ group : chr "Red" "Green" "Yellow" "Green" ...
# $ age (y) : int 24 30 41 22 31 35 22
# $ height (cm): int 182 170 169 183 178 172 164
# $ wieght (kg): num 74.8 70.1 60 75 83.9 76.2 68
# $ score : int NA 500 NA 865 221 413 902
【讨论】:
我运行的代码与您完全相同,但是当我运行fromJSON
时,它返回一个列表,而不是一个数据框。你是如何让它返回一个数据框的?
@Alexander - 我仍然收到data.frame
。确保您使用的是jsonlite::fromJSON
【参考方案3】:
要删除空值,请使用参数 nullValue
json_data <- fromJSON(json_file, nullValue = NA)
asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame))
这样你的输出中就不会有任何不必要的引号
【讨论】:
【参考方案4】:library(rjson)
Lines <- readLines("yelp_academic_dataset_business.json")
business <- as.data.frame(t(sapply(Lines, fromJSON)))
您可以尝试将 JSON 数据加载到 R
【讨论】:
【参考方案5】:dplyr::bind_rows(fromJSON(file_name))
【讨论】:
您使用的是哪个fromJson
函数?如果它来自jsonlite
,那么dplyr::bind_rows
是多余的。如果它来自rjson
,那么您提供的数据上的 solutino 错误。
不记得了;事情一定已经改变了【参考方案6】:
将包从 rjson
更改为 jsonlite
为我修复了它。
所以不要这样:
fromAPIPlantsPages <- rjson::fromJSON(content(apiGetPlants,type="text",encoding = "UTF-8"))
dfPlantenAPI <- as.data.frame(fromAPIPlantsPages)
我改成这样了:
fromAPIPlantsPages <- jsonlite::fromJSON(content(apiGetPlants,type="text",encoding = "UTF-8"))
dfPlantenAPI <- as.data.frame(fromAPIPlantsPages)
【讨论】:
以上是关于将导入的 json 数据导入数据框的主要内容,如果未能解决你的问题,请参考以下文章
从大型 JSON 中读取特定字段并导入 Pandas 数据框
使用 Python 导入 - 将多个 excel 文件导入数据框