R语言文摘:Subsetting Data

Posted chickenwrap

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R语言文摘:Subsetting Data相关的知识,希望对你有一定的参考价值。

原文地址:https://www.statmethods.net/management/subset.html

 

R has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations. The following code snippets demonstrate ways to keep or delete variables and observations and to take random samples from a dataset.

Selecting (Keeping) Variables

# select variables v1, v2, v3
myvars <- c("v1", "v2", "v3")
newdata <- mydata[myvars]

# another method
myvars <- paste("v", 1:3, sep="")
newdata <- mydata[myvars]

# select 1st and 5th thru 10th variables
newdata <- mydata[c(1,5:10)]

To practice this interactively, try the selection of data frame elements exercises in the Data frames chapter of this introduction to R course.

 

Excluding (DROPPING) Variables

# exclude variables v1, v2, v3
myvars <- names(mydata) %in% c("v1", "v2", "v3") 
newdata <- mydata[!myvars]

# exclude 3rd and 5th variable 
newdata <- mydata[c(-3,-5)]

# delete variables v3 and v5
mydata$v3 <- mydata$v5 <- NULL

Selecting Observations

# first 5 observations
newdata <- mydata[1:5,]

# based on variable values
newdata <- mydata[ which(mydata$gender==‘F‘ 
& mydata$age > 65), ]

# or
attach(mydata)
newdata <- mydata[ which(gender==‘F‘ & age > 65),]
detach(mydata)

Selection using the Subset Function

The subset( ) function is the easiest way to select variables and observations. In the following example, we select all rows that have a value of age greater than or equal to 20 or age less then 10. We keep the ID and Weight columns.

# using subset function 
newdata <- subset(mydata, age >= 20 | age < 10, 
select=c(ID, Weight))

In the next example, we select all men over the age of 25 and we keep variables weight through income (weight, income and all columns between them).

# using subset function (part 2)
newdata <- subset(mydata, sex=="m" & age > 25,
select=weight:income)

To practice the subset() function, try this this interactive exercise. on subsetting data.tables.

Random Samples

Use the sample( ) function to take a random sample of size n from a dataset.

# take a random sample of size 50 from a dataset mydata 
# sample without replacement
mysample <- mydata[sample(1:nrow(mydata), 50,
   replace=FALSE),]



































以上是关于R语言文摘:Subsetting Data的主要内容,如果未能解决你的问题,请参考以下文章

R语言取子集

需要对特定 R 代码片段的解释

有人可以解释以下 R 代码片段吗? [关闭]

片段中的Android onActivityResult

用 subsetting 限制连接池中的连接数量

r语言怎么调用data.frame数据框的某列数据