如何在 R 中解析具有堆叠多个 JSON 的文件？

Posted 2023-03-11

技术标签:

【中文标题】如何在 R 中解析具有堆叠多个 JSON 的文件？【英文标题】：How to parse a file with stacked multiple JSONs in R? 【发布时间】：2018-10-30 00:40:09 【问题描述】：

我在 R 中有以下“堆叠 JSON”对象，example1.json：

"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
  "Code":["event1":"A","result":"1",…]
"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
  "Code":["event1":"B","result":"1",…]
"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
  "Code":["event1":"B","result":"0",…]

这些不是逗号分隔的。基本目标是将某些字段（或所有字段）解析为 R data.frame 或 data.table：

    Timestamp    Usefulness
 0   20140101      Yes
 1   20140102      No
 2   20140103      No

通常，我会在 R 中读取 JSON，如下所示：

library(jsonlite)

jsonfile = "example1.json"
foobar = fromJSON(jsonfile)

然而，这会引发解析错误：

Error: lexical error: invalid char in json text.
          ["event1":"A","result":"1",…] "ID":"1A35B","Timestamp"
                     (right here) ------^

这与以下问题类似，但在 R 中：multiple Json objects in one file extract by python

编辑：这种文件格式称为“换行符分隔的 JSON”，NDJSON。

【问题讨论】：

"Code" 之前是否真的有换行符，或者您这样做是为了便于阅读？我还假设 ... 是你而不是 JSON。如果它们是每行包含一个紧凑 JSON 记录的文件，则它们是“ndjson”文件，您可以使用 ndjson::stream_in()，它比 jsonlite 对应物更快，并且始终生成“平面”数据框。 @hrbrmstr 是的，请标记为重复问题。 【参考方案1】：

三个点 ... 使您的 JSON 无效，因此您的 lexical error 无效。

您可以使用jsonlite::stream_in() '流入' JSON 行。

library(jsonlite)

jsonlite::stream_in(file("~/Desktop/examples1.json"))
# opening file input connection.
# Imported 3 records. Simplifying...
# closing file input connection.
#      ID Timestamp Usefulness Code
# 1 12345  20140101        Yes A, 1
# 2 1A35B  20140102         No B, 1
# 3 AA356  20140103         No B, 0

数据

我已经清理了您的示例数据以使其成为有效的 JSON，并将其作为 ~/Desktop/examples1.json 保存到我的桌面

"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes","Code":["event1":"A","result":"1"]
"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No","Code":["event1":"B","result":"1"]
"ID":"AA356","Timestamp":"20140103", "Usefulness":"No","Code":["event1":"B","result":"0"]

【讨论】：

以上是关于如何在 R 中解析具有堆叠多个 JSON 的文件？的主要内容，如果未能解决你的问题，请参考以下文章