R - 数据帧中的条件更新坐标列

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R - 数据帧中的条件更新坐标列相关的知识,希望对你有一定的参考价值。

我试图在数据框中填充两个新空列,其中包含来自同一数据框中其他列的数据,具体取决于它们是否已填充。

我正在尝试填充HIGH_PRCN_LAT和HIGH_PRCN_LON(以前称为F_Lat和F_Lon)的值,这些值表示这些行的最终纬度和行程,这将基于表中其他列的值。

情况1:填充Lat / Lon2(如ID 1和2中),使用大圆算法,应计算它们之间的中点,然后放入F_Lat和F_Lon。

情况2:Lat / Lon2为空,然后Lat / Lon1的值应放入F_Lat和F_Lon(与ID 3和4一样)。

我的代码如下,但不起作用(请参阅以前的版本,在编辑中删除)。

我使用的准备代码如下:

incidents <- structure(list(id = 1:9, StartDate = structure(c(1L, 3L, 2L, 
2L, 2L, 3L, 1L, 3L, 1L), .Label = c("02/02/2000 00:34", "02/09/2000 22:13", 
"20/01/2000 14:11"), class = "factor"), EndDate = structure(1:9, .Label = c("02/04/2006 20:46", 
"02/04/2006 22:38", "02/04/2006 23:21", "02/04/2006 23:59", "03/04/2006 20:12", 
"03/04/2006 23:56", "04/04/2006 00:31", "07/04/2006 06:19", "07/04/2006 07:45"
), class = "factor"), Yr.Period = structure(c(1L, 1L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L), .Label = c("2000 / 1", "2000 / 2", "2000 /3"
), class = "factor"), Description = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "ENGLISH TEXT", class = "factor"), 
    Location = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L
    ), .Label = c("Location 1", "Location 1 : Location 2"), class = "factor"), 
    Location.1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = "Location 1", class = "factor"), Postcode.1 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Postcode 1", class = "factor"), 
    Location.2 = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
    1L), .Label = c("", "Location 2"), class = "factor"), Postcode.2 = structure(c(2L, 
    2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("", "Postcode 2"
    ), class = "factor"), Section = structure(c(2L, 2L, 3L, 1L, 
    4L, 4L, 2L, 1L, 4L), .Label = c("East", "North", "South", 
    "West"), class = "factor"), Weather.Category = structure(c(1L, 
    2L, 4L, 2L, 2L, 2L, 4L, 1L, 3L), .Label = c("Animals", "Food", 
    "Humans", "Weather"), class = "factor"), Minutes = c(13L, 
    55L, 5L, 5L, 5L, 522L, 1L, 11L, 22L), Cost = c(150L, 150L, 
    150L, 20L, 23L, 32L, 21L, 11L, 23L), Location.1.Lat = c(53.0506727, 
    53.8721035, 51.0233529, 53.8721035, 53.6988355, 53.4768766, 
    52.6874562, 51.6638245, 51.4301359), Location.1.Lon = c(-2.9991256, 
    -2.4004125, -3.0988341, -2.4004125, -1.3031529, -2.2298073, 
    -1.8023421, -0.3964916, 0.0213837), Location.2.Lat = c(52.7116187, 
    53.746791, NA, 53.746791, 53.6787167, 53.4527824, 52.5264907, 
    NA, NA), Location.2.Lon = c(-2.7493169, -2.4777984, NA, -2.4777984, 
    -1.489026, -2.1247029, -1.4645023, NA, NA)), class = "data.frame", row.names = c(NA, -9L))

#gpsColumns is used as the following line of code is used for several data frames.
gpsColumns <- c("HIGH_PRCN_LAT", "HIGH_PRCN_LON")
incidents [ , gpsColumns] <- NA

#create separate variable(?) containing a list of which rows are complete
ind <- complete.cases(incidents [,17])

#populate rows with a two Lat/Lons with great circle middle of both values
incidents [ind, c("HIGH_PRCN_LON_2","HIGH_PRCN_LAT_2")] <- 
  with(incidents [ind,,drop=FALSE],
       do.call(rbind, geosphere::midPoint(cbind.data.frame(Location.1.Lon, Location.1.Lat), cbind.data.frame(Location.2.Lon, Location.2.Lat))))

#populate rows with one Lat/Lon with those values
incidents[!ind, c("HIGH_PRCN_LAT","HIGH_PRCN_LON")] <- incidents[!ind, c("Location.1.Lat","Location.1.Lon")]

我将根据推荐使用geosphere :: midPoint函数:http://r.789695.n4.nabble.com/Midpoint-between-coordinates-td2299999.html

不幸的是,当有几种情况时,这种填充列的方式似乎不起作用。

抛出的当前错误是:

Error in `$<-.data.frame`(`*tmp*`, F_Lat, value = integer(0)) : 
  replacement has 0 rows, data has 178012

编辑:也发布到reddit:https://www.reddit.com/r/Rlanguage/comments/bdvavx/conditional_updating_column_in_dataframe/

编辑:添加了我不理解的代码部分的清晰度。

#replaces the F_Lat2/F_Lon2 columns in rows with a both sets of input coordinates 
dataframe[ind, c("F_Lat2","F_Lon2")] <-
#I am unclear on what this means, specifically what the "with" function does and what "drop=FALSE" does and also why they were used in this case.
  with(dataframe[ind,,drop=FALSE],
#I am unclear on what do.call and rbind are doing here, but the second half (geosphere onwards) is binding the Lats and Lons to make coordinates as inputs for the gcIntermediate function.
       do.call(rbind, geosphere::gcIntermediate(cbind.data.frame(Lat1, Lon1),
                                                cbind.data.frame(Lat2, Lon2), n = 1)))
答案

虽然你的代码不适合我,但我无法计算你期望的相同精确值,我怀疑你看到的错误可以通过这些步骤来解决。 (这里的数据在底部。)

  1. 预填充空列。
  2. 预先计算complete.cases步骤,它将节省时间。
  3. 使用cbind.data.frame进行内部gcIntermediate

我推断自己

gcIntermediate([dataframe...
               ^
               this is an error in R

你将这些列绑定在一起,所以我将使用cbind.data.frame。 (使用cbind本身从geosphere产生了一些可忽略的警告,所以你可以使用它而不是suppressWarnings,但是这个功能有点强大,因为它也会掩盖其他警告。)

此外,因为看起来你想为每对坐标一个中间值,我添加了gcIntermediate(..., n=1)参数。

do.call(rbind, ...)的使用是因为gcIntermediate返回list,所以我们需要将它们组合在一起。

dataframe$F_Lon2 <- dataframe$F_Lat2 <- NA_real_
ind <- complete.cases(dataframe[,4])

dataframe[ind, c("F_Lat2","F_Lon2")] <- 
  with(dataframe[ind,,drop=FALSE],
       do.call(rbind, geosphere::gcIntermediate(cbind.data.frame(Lat1, Lon1),
                                                cbind.data.frame(Lat2, Lon2), n = 1)))
dataframe[!ind, c("F_Lat2","F_Lon2")] <- dataframe[!ind, c("Lat1","Lon1")]
dataframe
#   ID     Lat1      Lon1     Lat2      Lon2    F_Lat     F_Lon   F_Lat2    F_Lon2
# 1  1 19.05067 -3.999126 92.71332 -6.759169 55.88200 -5.379147 55.78466 -6.709509
# 2  2 58.87210 -1.400413 54.74679 -4.479840 56.80945 -2.940126 56.81230 -2.942029
# 3  3 33.02335 -5.098834       NA        NA 33.02335 -5.098834 33.02335 -5.098834
# 4  4 54.87210 -4.400412       NA        NA 54.87210 -4.400412 54.87210 -4.400412

更新,使用您的新incidents数据并切换到geosphere::midPoint

试试这个:

incidents$F_Lon2 <- incidents$F_Lat2 <- NA_real_
ind <- complete.cases(incidents[,4])

incidents[ind, c("F_Lat2","F_Lon2")] <- 
  with(incidents[ind,,drop=FALSE],
       geosphere::midPoint(cbind.data.frame(Location.1.Lat,Location.1.Lon),
                           cbind.data.frame(Location.2.Lat,Location.2.Lon)))
incidents[!ind, c("F_Lat2","F_Lon2")] <- dataframe[!ind, c("Lat1","Lon1")]

一个(大)差异是geosphere::gcIntermediate(..., n=1)返回结果列表,而geosphere::midPoint(...)(没有n=)只返回一个矩阵,因此不需要rbinding。


数据:

dataframe <- read.table(header=T, stringsAsFactors=F, text="
ID Lat1       Lon1       Lat2      Lon2      F_Lat       F_Lon
1  19.0506727 -3.9991256 92.713318 -6.759169 55.88199535 -5.3791473
2  58.8721035 -1.4004125 54.746791 -4.47984  56.80944725 -2.94012625
3  33.0233529 -5.0988341 NA        NA        33.0233529  -5.0988341
4  54.8721035 -4.4004125 NA        NA        54.8721035  -4.4004125")

以上是关于R - 数据帧中的条件更新坐标列的主要内容,如果未能解决你的问题,请参考以下文章

如何根据 R 中的另一个数据帧解码一个数据帧中变量的值?

逐列匹配展平R数据帧中的行

根据条件替换R数据帧中的值[重复]

在R中的数据帧中动态地重新排序列

控制 spark-sql 和数据帧中的字段可空性

返回两个数据帧中两个长纬度坐标的每行和每列之间的最小距离