将这个json文件作为数据框放入R中

Posted 2021-04-28

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了将这个json文件作为数据框放入R中相关的知识，希望对你有一定的参考价值。

如何将此json文件（https://ix.cnn.io/data/novel-coronavirus-2019-ncov/us/historical.min.json）作为数据框引入？

我尝试了几种方法都无济于事。

答案

b <- jsonlite::fromJSON('https://ix.cnn.io/data/novel-coronavirus-2019-ncov/us/historical.min.json')
tidyr::unnest(b$data, cols = "data")
# # A tibble: 2,233 x 6
#    usps  name  fips  date       cases deaths
#    <chr> <chr> <chr> <chr>      <int>  <int>
#  1 GU    Guam  66    2020-03-16     3      0
#  2 GU    Guam  66    2020-03-17     3      0
#  3 GU    Guam  66    2020-03-18     5      0
#  4 GU    Guam  66    2020-03-19    12      0
#  5 GU    Guam  66    2020-03-20    14      0
#  6 GU    Guam  66    2020-03-21    15      0
#  7 GU    Guam  66    2020-03-22    27      1
#  8 GU    Guam  66    2020-03-23    29      1
#  9 GU    Guam  66    2020-03-24    32      1
# 10 GU    Guam  66    2020-03-25    37      1
# # ... with 2,223 more rows

注意，由于AS没有数据（请参见下文，第一帧具有0个观测值），因此将其从列表中过滤掉。要解决此问题：

unnest(b$data, cols = "data") %>%
  filter(usps == "AS")
# # A tibble: 0 x 6
# # ... with 6 variables: usps <chr>, name <chr>, fips <chr>, date <chr>, cases <int>,
# #   deaths <int>

lengths(b$data$data)
#  [1] 0 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
# [46] 3 3 3 3 3 3 3 3 3 3 3 3 3
onegood <- Filter(nrow, b$data$data)[[1]]
head(onegood)
#         date cases deaths
# 1 2020-03-16     3      0
# 2 2020-03-17     3      0
# 3 2020-03-18     5      0
# 4 2020-03-19    12      0
# 5 2020-03-20    14      0
# 6 2020-03-21    15      0
onegood <- onegood[NA,][1,]
head(onegood)
#    date cases deaths
# NA <NA>    NA     NA
hasnothing <- lengths(b$data$data) < 1
which(hasnothing)
# [1] 1
b$data$data[ hasnothing ] <- replicate(sum(hasnothing), onegood, simplify = FALSE)

### now prove that we see `AS` data
unnest(b$data, cols = "data") %>%
  filter(usps == "AS")
# # A tibble: 1 x 6
#   usps  name           fips  date  cases deaths
#   <chr> <chr>          <chr> <chr> <int>  <int>
# 1 AS    American Samoa 60    <NA>     NA     NA
unnest(b$data, cols = "data")
# # A tibble: 2,234 x 6
#    usps  name           fips  date       cases deaths
#    <chr> <chr>          <chr> <chr>      <int>  <int>
#  1 AS    American Samoa 60    <NA>          NA     NA
#  2 GU    Guam           66    2020-03-16     3      0
#  3 GU    Guam           66    2020-03-17     3      0
#  4 GU    Guam           66    2020-03-18     5      0
#  5 GU    Guam           66    2020-03-19    12      0
#  6 GU    Guam           66    2020-03-20    14      0
#  7 GU    Guam           66    2020-03-21    15      0
#  8 GU    Guam           66    2020-03-22    27      1
#  9 GU    Guam           66    2020-03-23    29      1
# 10 GU    Guam           66    2020-03-24    32      1
# # ... with 2,224 more rows

我创建了onegood，以便以编程方式根据当前数据创建了一个代表性的NA框架。手动创建它绝对容易，但是我希望在以后添加更多列时保持灵活性。

回填：

str(b)
# List of 3
#  $ lastUpdated   : chr "2020-04-15T23:55:39Z"
#  $ lastUpdatedStr: chr "April 15, 2020 at 7:55 p.m. ET"
#  $ data          :'data.frame':   58 obs. of  4 variables:
#   ..$ usps: chr [1:58] "AS" "GU" "MP" "PR" ...
#   ..$ name: chr [1:58] "American Samoa" "Guam" "Northern Mariana Islands" "Puerto Rico" ...
#   ..$ fips: chr [1:58] "60" "66" "69" "72" ...
#   ..$ data:List of 58
#   .. ..$ :'data.frame':   0 obs. of  0 variables
#   .. ..$ :'data.frame':   31 obs. of  3 variables:
#   .. .. ..$ date  : chr [1:31] "2020-03-16" "2020-03-17" "2020-03-18" "2020-03-19" ...
#   .. .. ..$ cases : int [1:31] 3 3 5 12 14 15 27 29 32 37 ...
#   .. .. ..$ deaths: int [1:31] 0 0 0 0 0 0 1 1 1 1 ...
#   .. ..$ :'data.frame':   16 obs. of  3 variables:
#   .. .. ..$ date  : chr [1:16] "2020-03-31" "2020-04-01" "2020-04-02" "2020-04-03" ...
#   .. .. ..$ cases : int [1:16] 2 6 6 8 8 8 8 8 11 11 ...
#   .. .. ..$ deaths: int [1:16] 0 1 1 1 1 1 1 2 2 2 ...
# <truncated>

另一答案

我能够将此json文件添加到数据框中，但是您还需要取消列出嵌套列表。这是对我有用的方法：

library(rjson)
library(tidyr)

data <- fromJSON("https://ix.cnn.io/data/novel-coronavirus-2019-ncov/us/historical.min.json")
data <- as_tibble(data$data)
df <- data %>%  unnest(c(usps,name,fips,data))

head(df)

> head(df)
# A tibble: 6 x 6
  usps  name  fips  date       cases deaths
  <chr> <chr> <chr> <chr>      <int>  <int>
1 GU    Guam  66    2020-03-16     3      0
2 GU    Guam  66    2020-03-17     3      0
3 GU    Guam  66    2020-03-18     5      0
4 GU    Guam  66    2020-03-19    12      0
5 GU    Guam  66    2020-03-20    14      0
6 GU    Guam  66    2020-03-21    15      0

另一答案

类似于上面ORStudent的回答，但有额外的步骤，并且也不会删除它不能嵌套的任何行

library(jsonlite) 
library(dplyr)

url <- "https://ix.cnn.io/data/novel-coronavirus-2019-ncov/us/historical.min.json" 

# Read in JSON as a list into R
url_data <- jsonlite::fromJSON(url)

# Get Actual Data From the JSON
data <- url_data$data

# Create a dummy id for the data (id is the rownumber)
data$id <- c(1:nrow(data))

# Create a dataframe to store the results of the data held in the list
list_data <- data.frame(matrix(nrow = 0, ncol = 0), stringsAsFactors = F)

# Create variable i to add to the list_data dataframe
i <- 1
# Iterate through the list of dataframe held in data$data
sapply(data$data, FUN = function(x){
  temp <- as.data.frame(x, stringsAsFactors = F) #Concert list to a dataframe
  if (nrow(temp) > 0){
    temp$id <- i
    list_data <<- bind_rows(list_data, temp) # Add rows to the botton of list_data, which is holding all the list data
  }
  i <<- i + 1 # Add 1 to i
})

# Merge in list data to upper level df from list on id (i)
all_data <- merge(data, list_data, all.x = T, by = "id")
head(all_data)

以上是关于将这个json文件作为数据框放入R中的主要内容，如果未能解决你的问题，请参考以下文章