编写 R 循环以创建新的标准化列

Posted 2023-02-14

技术标签:

【中文标题】编写 R 循环以创建新的标准化列【英文标题】：Writing an R loop to create new standardized columns 【发布时间】：2022-01-12 23:15:06 【问题描述】：

我在 R 中使用 Ionosphere 数据集，并尝试编写一个循环来创建新列，这些新列是现有列的标准化迭代并相应地命名它们。

我将“cname”作为新列名，将 c 作为原始列名。代码是：

install.packages("mlbench") 
library(mlbench) 
data('Ionosphere')
library(robustHD)
col <- colnames(Ionosphere)
for (c in col[1:length(col)-1])
  cname <- paste(c,"Std")
  Ionosphere$cname <- standardize(Ionosphere$c)

但是得到以下错误：

"Error in `$<-.data.frame`(`*tmp*`, "cname", value = numeric(0)) : 
  replacement has 0 rows, data has 351
In addition: Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA"

我觉得我错过了一些超级简单的东西，但我就是看不到它。

感谢您的帮助。

【问题讨论】：

避免使用常见的函数名称命名对象通常是一种好习惯，例如c（如c()）。你确定有一个名为“c”的变量吗？请分享这个“电离层”数据。你可以使用dput(head(Ionosphere, 10)) c 基本上是循环。它遍历每个列名，然后（理论上）使用原始列名 + std 创建一个新列。 data_set 是什么？抱歉，data_set 不应该在那里。正在尝试不同的事情。我现在已经编辑过了。 【参考方案1】：

我们可以使用lapply，一个定制的标准化函数，setNames，和cbind。我无权访问您的数据集，所以我以 iris 数据集为例：

df<-iris
cbind(df, set_names(lapply(df[1:4],
                           \(x) (x - mean(x))/sd(x)),
                     paste0(names(df[1:4]), '_Std')))

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species Sepal.Length_Std Sepal.Width_Std Petal.Length_Std Petal.Width_Std
1            5.1         3.5          1.4         0.2     setosa      -0.89767388      1.01560199      -1.33575163   -1.3110521482
2            4.9         3.0          1.4         0.2     setosa      -1.13920048     -0.13153881      -1.33575163   -1.3110521482
3            4.7         3.2          1.3         0.2     setosa      -1.38072709      0.32731751      -1.39239929   -1.3110521482
4            4.6         3.1          1.5         0.2     setosa      -1.50149039      0.09788935      -1.27910398   -1.3110521482
5            5.0         3.6          1.4         0.2     setosa      -1.01843718      1.24503015      -1.33575163   -1.3110521482
...

我觉得使用 dplyr 可以更轻松地进行这些转换：

library(dplyr)

iris %>% mutate(across(where(is.numeric),
                       ~ (.x - mean(.x))/sd(.x),
                       .names = "col_Std"))

【讨论】：

以上是关于编写 R 循环以创建新的标准化列的主要内容，如果未能解决你的问题，请参考以下文章