转置和计算皮尔逊相关性
Posted
技术标签:
【中文标题】转置和计算皮尔逊相关性【英文标题】:Tranpose and Calculate pearson correlation 【发布时间】:2018-07-23 14:54:16 【问题描述】:我真的是编码新手,我需要在数据集中运行一些统计数据,例如 pearson 相关性,但我在处理数据时遇到了一些麻烦。
据我了解,我需要转置我的数据以计算 pearson 相关性,但这是我遇到一些问题的地方。对于初学者,列名变成一个新行而不是成为新的列名。然后我收到一条消息,指出我的值不是数字。
我也有一些 NA,我正在尝试计算与此代码的相关性
cor(cr, use = "complete.obs", method = "pearson")
Error in cor(cr1, use = "complete.obs", method = "pearson") :
'x' must be numeric
我需要知道 Victoria 和 Nuria 之间的相关性,应该是 0.3651484
这是我的数据集的输入:
> dput(cr)
structure(list(User = structure(c(8L, 10L, 2L, 17L, 11L, 1L,
18L, 9L, 7L, 5L, 3L, 14L, 13L, 4L, 20L, 6L, 16L, 12L, 15L, 19L
), .Label = c("Ana", "Anton", "Bernard", "Carles", "Chris", "Ivan",
"Jim", "John", "Marc", "Maria", "Martina", "Nadia", "Nerea",
"Nuria", "Oriol", "Rachel", "Roger", "Sergi", "Valery", "Victoria"
), class = "factor"), Star.Wars.IV...A.New.Hope = c(1L, 5L, NA,
NA, 4L, 2L, NA, 4L, 5L, 4L, 2L, 3L, 2L, 3L, 4L, NA, NA, 4L, 5L,
1L), Star.Wars.VI...Return.of.the.Jedi = c(5L, 3L, NA, 3L, 3L,
4L, NA, NA, 1L, 2L, 1L, 5L, 3L, NA, 4L, NA, NA, 5L, 1L, 2L),
Forrest.Gump = c(2L, NA, NA, NA, 4L, 4L, 3L, NA, NA, NA,
5L, 2L, NA, 3L, NA, 1L, NA, 1L, NA, 2L), The.Shawshank.Redemption = c(NA,
2L, 5L, NA, 1L, 4L, 1L, NA, 4L, 5L, NA, NA, 5L, NA, NA, NA,
NA, 5L, NA, 4L), The.Silence.of.the.Lambs = c(4L, 4L, 2L,
NA, 4L, NA, 1L, 3L, 2L, 3L, NA, 2L, 4L, 2L, 5L, 3L, 4L, 1L,
NA, 5L), Gladiator = c(4L, 2L, NA, 1L, 1L, NA, 4L, 2L, 4L,
NA, 5L, NA, NA, NA, 5L, 2L, NA, 1L, 4L, NA), Toy.Story = c(2L,
1L, 4L, 2L, NA, 3L, NA, 2L, 4L, 4L, 5L, 2L, 4L, 3L, 2L, NA,
2L, 4L, 2L, 2L), Saving.Private.Ryan = c(2L, NA, NA, 3L,
4L, 1L, 5L, NA, 4L, 3L, NA, NA, 5L, NA, NA, 2L, NA, NA, 1L,
3L), Pulp.Fiction = c(NA, NA, NA, 4L, NA, 4L, 2L, 3L, NA,
4L, NA, 1L, NA, NA, 3L, NA, 2L, 5L, 3L, 2L), Stand.by.Me = c(3L,
4L, 1L, NA, 1L, 4L, NA, NA, 1L, NA, NA, NA, NA, 4L, 5L, 1L,
NA, NA, 3L, 2L), Shakespeare.in.Love = c(2L, 3L, NA, NA,
5L, 5L, 1L, NA, 2L, NA, NA, 3L, NA, NA, NA, 5L, 2L, NA, 3L,
1L), Total.Recall = c(NA, 2L, 1L, 4L, 1L, 2L, NA, 2L, 3L,
NA, 3L, NA, 2L, 1L, 1L, NA, NA, NA, 1L, NA), Independence.Day = c(5L,
2L, 4L, 1L, NA, 4L, NA, 3L, 1L, 2L, 2L, 3L, 4L, 2L, 3L, NA,
NA, NA, NA, NA), Blade.Runner = c(2L, NA, 4L, 3L, 4L, NA,
3L, 2L, NA, NA, NA, NA, NA, 2L, NA, NA, NA, 4L, NA, 5L),
Groundhog.Day = c(NA, 2L, 1L, 5L, NA, 1L, NA, 4L, 5L, NA,
NA, 2L, 3L, 3L, 2L, 5L, NA, NA, NA, 5L), The.Matrix = c(4L,
NA, 1L, NA, 3L, NA, 1L, NA, NA, 2L, 1L, 5L, NA, 5L, NA, 2L,
4L, NA, 2L, 4L), Schindler.s.List = c(2L, 5L, 2L, 5L, 5L,
NA, NA, 1L, NA, 5L, NA, NA, NA, 1L, 3L, 2L, NA, 2L, NA, 3L
), The.Sixth.Sense = c(5L, 1L, 3L, 1L, 5L, 3L, NA, 3L, NA,
1L, 2L, NA, NA, NA, NA, 4L, NA, 1L, NA, 5L), Raiders.of.the.Lost.Ark = c(NA,
3L, 1L, 1L, NA, NA, 5L, 5L, NA, NA, 1L, NA, 5L, NA, 3L, 3L,
NA, 2L, NA, 3L), Babe = c(NA, NA, 3L, 2L, NA, 2L, 2L, NA,
5L, NA, 4L, 2L, NA, NA, 1L, 4L, NA, 5L, NA, NA)), .Names = c("User",
"Star.Wars.IV...A.New.Hope", "Star.Wars.VI...Return.of.the.Jedi",
"Forrest.Gump", "The.Shawshank.Redemption", "The.Silence.of.the.Lambs",
"Gladiator", "Toy.Story", "Saving.Private.Ryan", "Pulp.Fiction",
"Stand.by.Me", "Shakespeare.in.Love", "Total.Recall", "Independence.Day",
"Blade.Runner", "Groundhog.Day", "The.Matrix", "Schindler.s.List",
"The.Sixth.Sense", "Raiders.of.the.Lost.Ark", "Babe"), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
有人可以帮我吗?
【问题讨论】:
【参考方案1】:此代码应该为您提供所有用户之间的相关矩阵。
cr2<-t(cr[,2:21]) # Transpose (first column contains names)
colnames(cr2)<-cr[,1] # Assign names to columns
cor(cr2,use="complete.obs") # Gives an error because there are no complete obs
# Error in cor(cr2, use = "complete.obs") : no complete element pairs
cor(cr2,use="pairwise.complete.obs") # use pairwise deletion
Victoria 和 Nuria 之间的相关性为 0.36514837(使用成对删除)
编辑:要通过列表删除获得 Victoria 和 Nuria 之间的相关性,请运行上述代码,然后
cr2<-as.data.frame(cr2)
with(cr2, cor(Victoria, Nuria, use = "complete.obs", method = "pearson"))
[1] 0.3651484
【讨论】:
感谢您的帮助,我收到此错误 ---> colnames(cr2) 我重新运行代码并没有得到同样的错误,也许你在转置时忘记排除第一列?cr2
应该是一个 20x20 矩阵
不确定发生了什么,这是我能够做到的唯一方法 ------- cr1
我让它工作了,我使用的是 cr
在您的转置命令中,排除第一列。因此,不要尝试t(cr)
,而是尝试t(cr[,2:21])
或t(cr[,-1])
【参考方案2】:
作为@Niek 回答的总结。首先通过排除第一列(包含名称且不是数字,因此不能用于相关性计算)将数据框转置为t()
;在同一步骤中将这些名称分配给新列。然后计算具体的相关性。解决方案是:
cr2 <- setNames(as.data.frame(t(cr[, -1])), cr[, 1])
with(cr2, cor(Victoria, Nuria, use = "complete.obs"))
[1] 0.3651484
或者对于整个相关矩阵:
cor(cr2, use = "pairwise.complete.obs")
【讨论】:
感谢您的帮助,我需要计算 Victoria 和 Nuria 的相关性,我更改了您的一些代码并且它有效 ---- cr1以上是关于转置和计算皮尔逊相关性的主要内容,如果未能解决你的问题,请参考以下文章