将数据框与 SpatialPolygonsDataFrame 合并

Posted

技术标签:

【中文标题】将数据框与 SpatialPolygonsDataFrame 合并【英文标题】:Merge data frame with SpatialPolygonsDataFrame 【发布时间】:2015-11-19 19:24:26 【问题描述】:

我想合并一个SpatialPolygonsDataFrame

# From https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html
states <- readOGR(dsn = "./cb_2014_us_state_20m.shp",
                  layer = "cb_2014_us_state_20m", verbose = FALSE)

使用普通数据框:

my_counts <- data.frame(
  State = c(
    "CA", "TX", "IL", "FL", "NY", "OH",
    "NJ", "GA", "MI", "PA", "MA", "CO", "AZ", "NC", "VA", "WA", "IN",
    "MD", "MN", "WI", "MO", "TN", "IA", "KY", "LA", "SC", "CT", "AL",
    "KS", "OR", "OK", "AR", "NV", "UT", "NE", "ID", "MS", "DC", "NM",
    "NH", "ME", "AK", "RI", "MT", "HI", "WV", "SD", "ND", "DE", "VT",
    "WY", "PR", "GU", "VI", "MP", "AS", "na", "MH", "FM", "PW"
  ),
  count = c(
    1590533L, 1016328L, 754535L, 742603L, 714205L,
    538719L, 477278L, 452064L, 437162L, 428616L, 420332L, 391084L,
    380853L, 354601L, 342533L, 335505L, 294670L, 286026L, 273427L,
    246172L, 238968L, 236037L, 235030L, 209514L, 199013L, 191707L,
    185521L, 179931L, 163477L, 159862L, 142610L, 136006L, 120111L,
    117338L, 112671L, 106176L, 102564L, 100168L, 97496L, 69881L,
    69508L, 68684L, 65631L, 62109L, 61123L, 57300L, 57254L, 56091L,
    51696L, 33944L, 32136L, 4822L, 598L, 468L, 49L, 19L, 17L,
    11L, 2L, 1L
  )
)

目标是使用结果与leaflet制作地图

我试过 sp::merge

 df1 <- sp::merge(x= states, y=my_counts)

但我得到一个错误:

Error in table(y[, by.y]) : attempt to set an attribute on NULL

【问题讨论】:

还有一个提示(因为@bondeddust 找到了答案)是在readOGR 调用中使用stringsAsFactors=FALSEdata.frame 创建中使用 以避免潜在因素/操作数据时的字符问题。 【参考方案1】:

警告:我以前从未这样做过,所以我正在“摸索”。先看对象-states

注意:这是在 R 3.2.1 下加载的 rgdal_0.9-3 和 sp_1.1-1(并且在我的 OSX 系统上安装了 GDAL,来自 kingchaos,IIRC):

> str(states)
Formal class 'SpatialPolygonsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 52 obs. of  9 variables:
  .. ..$ STATEFP : Factor w/ 52 levels "01","02","04",..: 5 9 10 11 13 14 16 18 19 21 ...
  .. ..$ STATENS : Factor w/ 52 levels "00068085","00294478",..: 22 17 2 18 27 28 29 30 16 19 ...
  .. ..$ AFFGEOID: Factor w/ 52 levels "0400000US01",..: 5 9 10 11 13 14 16 18 19 21 ...
  .. ..$ GEOID   : Factor w/ 52 levels "01","02","04",..: 5 9 10 11 13 14 16 18 19 21 ...
  .. ..$ STUSPS  : Factor w/ 52 levels "AK","AL","AR",..: 5 8 10 11 14 15 13 18 19 21 ...
  .. ..$ NAME    : Factor w/ 52 levels "Alabama","Alaska",..: 5 9 10 11 13 14 16 18 19 21 ...
  .. ..$ LSAD    : Factor w/ 1 level "00": 1 1 1 1 1 1 1 1 1 1 ...
  .. ..$ ALAND   : num [1:52] 4.03e+11 1.58e+08 1.39e+11 1.49e+11 2.14e+11 ...
  .. ..$ AWATER  : num [1:52] 2.05e+10 1.86e+07 3.14e+10 4.95e+09 2.40e+09 ...
  ..@ polygons   :List of 52
  .. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
  .. .. .. ..@ Polygons :List of 6
  .. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
  .. .. .. .. .. .. ..@ labpt  : num [1:2] -118.4 33.4
  .. .. .. .. .. .. ..@ area   : num 0.0259
  .. .. .. .. .. .. ..@ hole   : logi FALSE
#####   Snipped rest of output ............................

所以在寻求合并和阅读方面的帮助后:

 ?merge   # and choosing the option for:

Merge a Spatial* object having attributes with a data.frame
(in package sp in library /Library/Frameworks/R.framework/Versions/3.2/Resources/library)

我决定尝试(并且似乎成功了:

> newobj <- merge(states, my_counts, by.x="STUSPS", by.y="State")
Warning message:
In .local(x, y, ...) : 8 records in y cannot be matched to x

> names(newobj@data)
 [1] "STUSPS"   "STATEFP"  "STATENS"  "AFFGEOID" "GEOID"    "NAME"    
 [7] "LSAD"     "ALAND"    "AWATER"   "count"   

警告是有道理的。您似乎有一些“状态”shp 文件的作者没有预料到的额外“状态”:

> length( table(my_counts$State))
[1] 60
> length( unique(states@data$STUSPS) )
[1] 52

道德

合并时应该查看两个对象中的names-values:

> names(states)
[1] "STATEFP"  "STATENS"  "AFFGEOID" "GEOID"    "STUSPS"   "NAME"     "LSAD"    
[8] "ALAND"    "AWATER"  

> names(my_counts)
[1] "State" "count"

【讨论】:

您也可以直接使用@data 插槽(除非知道他们在做什么,否则不推荐),此过程的真正关键是不要弄乱行的顺序或行名。 感谢您的回答!我原以为这会复杂得多。【参考方案2】:

也许您应该像示例中一样添加参数“incomparable”:

"merge(x, y, by=intersect(names(x), names(y)),

by.x=by, by.y=by, all.x=TRUE, 后缀 = c(".x",".y"), incomparables=NULL, ...)"

【讨论】:

您错过了要点:by 的默认设置不会成功。

以上是关于将数据框与 SpatialPolygonsDataFrame 合并的主要内容,如果未能解决你的问题,请参考以下文章

将数据框与系列合并

将大型 Dask 数据框与小型 Pandas 数据框合并

将数据框与公共列连接起来[重复]

Pandas 将数据框与共享列合并,左右填充

将 1 列数据框与列表中的值组合

将数据框与其他数据框合并并根据特定条件计算分组百分比