最快的方法来读取/处理data.table中的“类似于JSON的”列?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了最快的方法来读取/处理data.table中的“类似于JSON的”列?相关的知识,希望对你有一定的参考价值。
以下是我正在使用的一些示例数据。 DT_IN
包含数据的输入格式,DT_OUT
包含我要使用的格式。从DT_IN
转到DT_OUT
的最佳方法是什么?
我已经尝试过strsplit
,但是没有设法按照相应的顺序将拆分分割为rbind
。愿意接受解决方案,也许Rcpp
可以帮助您吗?
library(data.table)
DT_IN <- data.table(
user_id = c(1L, 20L, 4L, 6L, 9L),
latitude = c(-41.3103218, -40.8307381, -37.3932037, -42.7178726, -45.0156822),
longitude = c(174.824554, 172.793106, 175.840637, 170.965454, 168.731186),
parameters = c(
"{""network""=>""Telecom NZ"", ""accuracy""=>28.659999847412, ""internet""=>""4G"", ""location_age""=>1}",
"{""location_age""=>716}", "{""location_age""=>851}", "{""accuracy""=>14, ""location_age""=>1}",
"{""network""=>""VodafoneNZ"", ""accuracy""=>29, ""internet""=>""3G"", ""location_age""=>31}"
)
)
> DT_IN
user_id latitude longitude parameters
1: 1 -41.31032 174.8246 {""network""=>""Telecom NZ"", ""accuracy""=>28.659999847412, ""internet""=>""4G"", ""location_age""=>1}
2: 20 -40.83074 172.7931 {""location_age""=>716}
3: 4 -37.39320 175.8406 {""location_age""=>851}
4: 6 -42.71787 170.9655 {""accuracy""=>14, ""location_age""=>1}
5: 9 -45.01568 168.7312 {""network""=>""VodafoneNZ"", ""accuracy""=>29, ""internet""=>""3G"", ""location_age""=>31}
DT_OUT <- data.table(
user_id = c(1L, 20L, 4L, 6L, 9L),
latitude = c(-41.3103218, -40.8307381, -37.3932037, -42.7178726, -45.0156822),
longitude = c(174.824554, 172.793106, 175.840637, 170.965454, 168.731186),
network = c('Telecom NZ', NA, NA, NA, 'VodafoneNZ'),
accuracy = c(28.659999847412, NA, NA, 14, 29),
internet = c('4G', NA, NA, NA, '3G'),
location_age = c(1, 716, 851, 1, 31)
)
> DT_OUT
user_id latitude longitude network accuracy internet location_age
1: 1 -41.31032 174.8246 Telecom NZ 28.66 4G 1
2: 20 -40.83074 172.7931 <NA> NA <NA> 716
3: 4 -37.39320 175.8406 <NA> NA <NA> 851
4: 6 -42.71787 170.9655 <NA> 14.00 <NA> 1
5: 9 -45.01568 168.7312 VodafoneNZ 29.00 3G 31
答案
使用jsonlite
包...
# Convert json like strings to json.
DT_IN[, parameters := gsub("""", """, parameters)]
DT_IN[, parameters := gsub("=>", ":", parameters)]
# Stream_in the json and cbind it to existing data.
DT_IN <- cbind(DT_IN, jsonlite::stream_in(textConnection(DT_IN$parameters)))
# Remove `parameters`
DT_IN[, parameters := NULL]
DT_IN
# user_id latitude longitude network accuracy internet location_age
# 1: 1 -41.31032 174.8246 Telecom NZ 28.66 4G 1
# 2: 20 -40.83074 172.7931 <NA> NA <NA> 716
# 3: 4 -37.39320 175.8406 <NA> NA <NA> 851
# 4: 6 -42.71787 170.9655 <NA> 14.00 <NA> 1
# 5: 9 -45.01568 168.7312 VodafoneNZ 29.00 3G 31
以上是关于最快的方法来读取/处理data.table中的“类似于JSON的”列?的主要内容,如果未能解决你的问题,请参考以下文章