将这个json文件作为数据框放入R中

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了将这个json文件作为数据框放入R中相关的知识,希望对你有一定的参考价值。

如何将此json文件(https://ix.cnn.io/data/novel-coronavirus-2019-ncov/us/historical.min.json)作为数据框引入?

我尝试了几种方法都无济于事。

答案
b <- jsonlite::fromJSON('https://ix.cnn.io/data/novel-coronavirus-2019-ncov/us/historical.min.json')
tidyr::unnest(b$data, cols = "data")
# # A tibble: 2,233 x 6
#    usps  name  fips  date       cases deaths
#    <chr> <chr> <chr> <chr>      <int>  <int>
#  1 GU    Guam  66    2020-03-16     3      0
#  2 GU    Guam  66    2020-03-17     3      0
#  3 GU    Guam  66    2020-03-18     5      0
#  4 GU    Guam  66    2020-03-19    12      0
#  5 GU    Guam  66    2020-03-20    14      0
#  6 GU    Guam  66    2020-03-21    15      0
#  7 GU    Guam  66    2020-03-22    27      1
#  8 GU    Guam  66    2020-03-23    29      1
#  9 GU    Guam  66    2020-03-24    32      1
# 10 GU    Guam  66    2020-03-25    37      1
# # ... with 2,223 more rows

注意,由于AS没有数据(请参见下文,第一帧具有0个观测值),因此将其从列表中过滤掉。要解决此问题:

unnest(b$data, cols = "data") %>%
  filter(usps == "AS")
# # A tibble: 0 x 6
# # ... with 6 variables: usps <chr>, name <chr>, fips <chr>, date <chr>, cases <int>,
# #   deaths <int>

lengths(b$data$data)
#  [1] 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
# [46] 3 3 3 3 3 3 3 3 3 3 3 3 3
onegood <- Filter(nrow, b$data$data)[[1]]
head(onegood)
#         date cases deaths
# 1 2020-03-16     3      0
# 2 2020-03-17     3      0
# 3 2020-03-18     5      0
# 4 2020-03-19    12      0
# 5 2020-03-20    14      0
# 6 2020-03-21    15      0
onegood <- onegood[NA,][1,]
head(onegood)
#    date cases deaths
# NA <NA>    NA     NA
hasnothing <- lengths(b$data$data) < 1
which(hasnothing)
# [1] 1
b$data$data[ hasnothing ] <- replicate(sum(hasnothing), onegood, simplify = FALSE)

### now prove that we see `AS` data
unnest(b$data, cols = "data") %>%
  filter(usps == "AS")
# # A tibble: 1 x 6
#   usps  name           fips  date  cases deaths
#   <chr> <chr>          <chr> <chr> <int>  <int>
# 1 AS    American Samoa 60    <NA>     NA     NA
unnest(b$data, cols = "data")
# # A tibble: 2,234 x 6
#    usps  name           fips  date       cases deaths
#    <chr> <chr>          <chr> <chr>      <int>  <int>
#  1 AS    American Samoa 60    <NA>          NA     NA
#  2 GU    Guam           66    2020-03-16     3      0
#  3 GU    Guam           66    2020-03-17     3      0
#  4 GU    Guam           66    2020-03-18     5      0
#  5 GU    Guam           66    2020-03-19    12      0
#  6 GU    Guam           66    2020-03-20    14      0
#  7 GU    Guam           66    2020-03-21    15      0
#  8 GU    Guam           66    2020-03-22    27      1
#  9 GU    Guam           66    2020-03-23    29      1
# 10 GU    Guam           66    2020-03-24    32      1
# # ... with 2,224 more rows

我创建了onegood,以便以编程方式根据当前数据创建了一个代表性的NA框架。手动创建它绝对容易,但是我希望在以后添加更多列时保持灵活性。


回填:

str(b)
# List of 3
#  $ lastUpdated   : chr "2020-04-15T23:55:39Z"
#  $ lastUpdatedStr: chr "April 15, 2020 at 7:55 p.m. ET"
#  $ data          :'data.frame':   58 obs. of  4 variables:
#   ..$ usps: chr [1:58] "AS" "GU" "MP" "PR" ...
#   ..$ name: chr [1:58] "American Samoa" "Guam" "Northern Mariana Islands" "Puerto Rico" ...
#   ..$ fips: chr [1:58] "60" "66" "69" "72" ...
#   ..$ data:List of 58
#   .. ..$ :'data.frame':   0 obs. of  0 variables
#   .. ..$ :'data.frame':   31 obs. of  3 variables:
#   .. .. ..$ date  : chr [1:31] "2020-03-16" "2020-03-17" "2020-03-18" "2020-03-19" ...
#   .. .. ..$ cases : int [1:31] 3 3 5 12 14 15 27 29 32 37 ...
#   .. .. ..$ deaths: int [1:31] 0 0 0 0 0 0 1 1 1 1 ...
#   .. ..$ :'data.frame':   16 obs. of  3 variables:
#   .. .. ..$ date  : chr [1:16] "2020-03-31" "2020-04-01" "2020-04-02" "2020-04-03" ...
#   .. .. ..$ cases : int [1:16] 2 6 6 8 8 8 8 8 11 11 ...
#   .. .. ..$ deaths: int [1:16] 0 1 1 1 1 1 1 2 2 2 ...
# <truncated>
另一答案

我能够将此json文件添加到数据框中,但是您还需要取消列出嵌套列表。这是对我有用的方法:

library(rjson)
library(tidyr)

data <- fromJSON("https://ix.cnn.io/data/novel-coronavirus-2019-ncov/us/historical.min.json")
data <- as_tibble(data$data)
df <- data %>%  unnest(c(usps,name,fips,data))

head(df)

> head(df)
# A tibble: 6 x 6
  usps  name  fips  date       cases deaths
  <chr> <chr> <chr> <chr>      <int>  <int>
1 GU    Guam  66    2020-03-16     3      0
2 GU    Guam  66    2020-03-17     3      0
3 GU    Guam  66    2020-03-18     5      0
4 GU    Guam  66    2020-03-19    12      0
5 GU    Guam  66    2020-03-20    14      0
6 GU    Guam  66    2020-03-21    15      0
另一答案

类似于上面ORStudent的回答,但有额外的步骤,并且也不会删除它不能嵌套的任何行

library(jsonlite) 
library(dplyr)

url <- "https://ix.cnn.io/data/novel-coronavirus-2019-ncov/us/historical.min.json" 

# Read in JSON as a list into R
url_data <- jsonlite::fromJSON(url)

# Get Actual Data From the JSON
data <- url_data$data

# Create a dummy id for the data (id is the rownumber)
data$id <- c(1:nrow(data))

# Create a dataframe to store the results of the data held in the list
list_data <- data.frame(matrix(nrow = 0, ncol = 0), stringsAsFactors = F)

# Create variable i to add to the list_data dataframe
i <- 1
# Iterate through the list of dataframe held in data$data
sapply(data$data, FUN = function(x){
  temp <- as.data.frame(x, stringsAsFactors = F) #Concert list to a dataframe
  if (nrow(temp) > 0){
    temp$id <- i
    list_data <<- bind_rows(list_data, temp) # Add rows to the botton of list_data, which is holding all the list data
  }
  i <<- i + 1 # Add 1 to i
})

# Merge in list data to upper level df from list on id (i)
all_data <- merge(data, list_data, all.x = T, by = "id")
head(all_data)

以上是关于将这个json文件作为数据框放入R中的主要内容,如果未能解决你的问题,请参考以下文章

如何通过单击适配器类中代码的项目中的删除按钮来删除列表视图中的项目后重新加载片段?

将带有坐标的json文件嵌套到R中的数据框中

VSCODE 查找在文件夹或者文件中代码或定义,在文件夹中查找文件的多种方法

VSCODE 查找在文件夹或者文件中代码或定义,在文件夹中查找文件的多种方法

将 JSON 导入熊猫数据框错误

使用作为 JSON 文件提供的模式创建数据框