如何从R中的大型固定宽度文件中读取特定列

Posted 2023-04-18

技术标签:

【中文标题】如何从R中的大型固定宽度文件中读取特定列【英文标题】：How to read in a specific column from a large fixed-width file in R 【发布时间】：2014-07-04 10:27:30 【问题描述】：

R 中是否有任何方便的方法可以从固定宽度的数据文件中读取特定列（或多列）？例如。文件如下所示：

10010100100002000000
00010010000001000000
10010000001002000000

说，我会对第 15 列感兴趣。目前我正在使用 read.fwf 读取整个数据，宽度为 1 的向量，长度为总列数：

data <- read.fwf("demo.asc", widths=rep(1,20))
data[,14]
[1] 2 1 2

这很好用，但不能扩展到具有 100,000 列和行的数据集。有没有有效的方法来做到这一点？

【问题讨论】：

this question 或 this discussion 对您有帮助吗？您是否检查了第二个?read.fwf 示例中width 参数的使用？ @Henrik 负宽度值？是的，我已经看到了，但是如果有多个列（首先排序，然后计算跳过等），这将非常复杂。 【参考方案1】：

您可以使用连接并以块的形式处理文件：

复制您的数据：

dat <-"10010100100002000000
00010010000001000000
10010000001002000000"

使用连接在块中处理：

# Define a connection
con = textConnection(dat)


# Do the block update
linesPerUpdate <- 2
result <- character()
repeat 
  line <- readLines(con, linesPerUpdate)
  result <- c(result, substr(line, start=14, stop=14))
  if (length(line) < linesPerUpdate) break


# Close the connection
close(con)

结果：

result
[1] "2" "1" "2"

【讨论】：

以上是关于如何从R中的大型固定宽度文件中读取特定列的主要内容，如果未能解决你的问题，请参考以下文章