如何使用 R lattice 重塑堆积条形图的数据 [重复]

Posted 2023-02-14

技术标签:

【中文标题】如何使用 R lattice 重塑堆积条形图的数据 [重复]【英文标题】：How to reshape data for a stacked barchart using R lattice [duplicate] 【发布时间】：2014-11-14 02:58:58 【问题描述】：

我在表格中有一堆数据（从 csv 导入），格式如下：

date        classes         score
9/1/11       french          34
9/1/11       english         34
9/1/11       french          34
9/1/11       spanish         34
9/2/11       french          34
9/2/11       english         34
9/3/11       spanish         34
9/3/11       spanish         34
9/5/11       spanish         34
9/5/11       english         34
9/5/11       french          34
9/5/11       english         34

忽略分数列，它不重要。

我需要根据日期统计参加英语、西班牙语或法语课程的学生总数，即。我需要首先按日期对其进行分组，然后根据语言将每一天分成更多的块，并将其绘制为堆积条形图，如下所示。每个条形代表一个日期，一个条形的每个横截面代表一种语言。

一旦我以矩阵形式获取数据，其中每一行代表一个日期，每一列代表一个属性（或语言），我已经想出了如何做到这一点。所以我假设数据是 csv 中的那种形式：

ie           french      english       spanish
9/1/11       2           1             1
9/2/11       1           1             0          
9/3/11       0           0             2
9/5/11       1           2             1

那我可以做：

directory<-"C:\\test\\language.csv"
ourdata6<-read.csv(directory)

language<-as.matrix(ourdata6)

barchart(prop.table(language), horizontal=FALSE, auto.key = list(space='right',cex=.5,border=T,points=F, lines=F,lwd=5,text=c('french','spanish','enligsh'),cex=.6), main = list(label="Distribution of classes 10",cex=2.5),  ylab = list(", cex=1.7),xlab.top=list("testing",cex=1.2))

挑战是将数据从原始格式转换为我需要的格式。

我试过了

a<-count(language, c("date", "classes"))

它给了我按两者排序的计数，但它是垂直形式的

ie
9/1/11       french           2             
9/1/11       english          1                       
9/1/11       spanish          1            
etc...

我需要对此进行旋转，使其成为每个日期的一行。此外，如果其中一些可能为零，那么我需要它们的占位符，即。第一列必须对应于法语，第二列必须对应于英语才能使我当前的设置正常工作。

关于如何做到这一点的任何想法，或者我使用 matrix + prop.table 的方法是否正确？有没有更简单的方法？

【问题讨论】：

【参考方案1】：

假设您的数据位于名为 df 的数据框中，您可以借助 dplyr 和 tidyr 包来实现：

library(dplyr)
library(tidyr)

wide <- df %>% select(date,classes) %>%
  group_by(date,classes) %>%
  summarise(n=n()) %>%            # as @akrun said, you can also use tally()
  spread(classes, n, fill=0)

使用您提供的示例数据，这会产生以下数据框：

  date english french spanish
9/1/11       1      2       1
9/2/11       1      1       0
9/3/11       0      0       2
9/5/11       2      1       1

现在您可以使用以下命令制作lattice 图：

barchart(date ~ english + french + spanish, data=wide, stack = TRUE,
         main = list(label="Distribution of language classes",cex=1.6),
         xlab = list("Number of classes", cex=1.1),
         ylab = list("Date", cex=1.1),
         auto.key = list(space='right',cex=1.2,text=c('Enligsh','French','Spanish')))

它给出了以下情节：

编辑：您也可以使用ggplot2，而不是使用格子图，这（至少在我看来）更容易理解。一个例子：

# convert the wide dataframe to a long one
long <- wide %>% gather(class, n, -date)

# load ggplot2
library(ggplot2)

# create the plot
ggplot(long, aes(date, n, fill=class)) +
  geom_bar(stat="identity", position="stack") +
  coord_flip() +
  theme_bw() +
  theme(axis.title=element_blank(), axis.text=element_text(size=12))

给出：

【讨论】：

+1，或df %>% group_by(date, classes) %>% tally() %>% spread(classes, n, fill=0) @akrun 感谢指向tally 函数。今天又学到了一些新东西:-) 您好，谢谢，该命令中的表名在哪里 @curfewed 在哪个命令中？如果没有具体说明，我很难回答这个问题...... HI jaap 实际上我有很多列，所以指定西班牙语 + 法语 + 英语 +.. 效率不高。这就是我尝试使用 prop.table 方法的原因。你的 Wide 工作得很好，现在 prop.table(wide) 不起作用，除非 wide 是一个矩阵，我做了 wide2 【参考方案2】：

我希望我没有遗漏任何东西，但在我看来你只是在寻找table：

table(df[c("date", "classes")])
#         classes
# date     english french spanish
#   9/1/11       1      2       1
#   9/2/11       1      1       0
#   9/3/11       0      0       2
#   9/5/11       2      1       1

结果是table（也是matrix），因此您可以随意使用barchart 命令。

这就是我得到的——看起来你需要在你的传奇上工作:-)

使用的代码是：

language <- table(df[c("date", "classes")])

barchart(prop.table(language), 
         horizontal = FALSE, 
         auto.key = list(space = 'right',
                         cex = .5, border = T, points = F, 
                         lines = F, lwd = 5, 
                         text = c('french','spanish','enligsh'),
                         cex = .6), 
         main = list(label = "Distribution of classes 10", cex = 2.5),
         ylab = list("", cex = 1.7), 
         xlab.top = list("testing", cex = 1.2))

【讨论】：

感谢这项工作，但唯一的问题是现在日期的顺序是月底的顺序，所以第一行是 9/1/11，第二行是 9/1/2012，第三行是2013 年 9 月 1 日等 @curfewed，好吧，如果您使用实际日期而不是字符串，或者如果您使用字符串，则将它们设为有序因子会有所帮助。这些与您的问题无关，即如何重塑您的数据以用于 lattice。

以上是关于如何使用 R lattice 重塑堆积条形图的数据 [重复]的主要内容，如果未能解决你的问题，请参考以下文章