R语言 整合数据
Posted 大学生资料阁
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R语言 整合数据相关的知识,希望对你有一定的参考价值。
R语言中提供了许多用来整合和重塑数据的强大方法
在整合数据时,往往将多组观测值替换为根据这些观测值计算的描叙性统计量
在重塑数据时,则会通过修改数据的结构(行和列)来决定数据的组织方式
使用SQL语句操作数据(*)
虽然在R语言中有很多优秀的函数,如aggregate和daply可以对数据框统计,但sql功能强大,不仅能实现数据的清洗、统计、运算,还可以实现数据存储、控制、定义和调用
library(sqldf)
示例:
安装sqldf包
install.packages("sqldf")
运行结果:
WARNING: Rtools is required to build R packages but is not currently installed. Please # download and install the appropriate version of Rtools before proceeding:
https://cran.rstudio.com/bin/windows/Rtools/
Installing package into ‘C:/Users/Admin/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
also installing the dependencies ‘ellipsis’, ‘glue’, ‘bit’, ‘rlang’, ‘vctrs’, ‘digest’, ‘bit64’, ‘blob’, ‘memoise’, ‘pkgconfig’, ‘Rcpp’, ‘BH’, ‘plogr’, ‘gsubfn’, ‘proto’, ‘RSQLite’, ‘DBI’, ‘chron’
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/ellipsis_0.3.0.zip'
Content type 'application/zip' length 44575 bytes (43 KB)
downloaded 43 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/glue_1.4.0.zip'
Content type 'application/zip' length 158233 bytes (154 KB)
downloaded 154 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/bit_1.1-15.2.zip'
Content type 'application/zip' length 252475 bytes (246 KB)
downloaded 246 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/rlang_0.4.5.zip'
Content type 'application/zip' length 1131356 bytes (1.1 MB)
downloaded 1.1 MB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/vctrs_0.2.4.zip'
Content type 'application/zip' length 1027328 bytes (1003 KB)
downloaded 1003 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/digest_0.6.25.zip'
Content type 'application/zip' length 249452 bytes (243 KB)
downloaded 243 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/bit64_0.9-7.zip'
Content type 'application/zip' length 551485 bytes (538 KB)
downloaded 538 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/blob_1.2.1.zip'
Content type 'application/zip' length 47627 bytes (46 KB)
downloaded 46 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/memoise_1.1.0.zip'
Content type 'application/zip' length 36855 bytes (35 KB)
downloaded 35 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/pkgconfig_2.0.3.zip'
Content type 'application/zip' length 22207 bytes (21 KB)
downloaded 21 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/Rcpp_1.0.4.6.zip'
Content type 'application/zip' length 3030802 bytes (2.9 MB)
downloaded 2.9 MB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/BH_1.72.0-3.zip'
Content type 'application/zip' length 18270741 bytes (17.4 MB)
downloaded 17.4 MB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/plogr_0.2.0.zip'
Content type 'application/zip' length 18864 bytes (18 KB)
downloaded 18 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/gsubfn_0.7.zip'
Content type 'application/zip' length 358104 bytes (349 KB)
downloaded 349 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/proto_1.0.0.zip'
Content type 'application/zip' length 472221 bytes (461 KB)
downloaded 461 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/RSQLite_2.2.0.zip'
Content type 'application/zip' length 2275367 bytes (2.2 MB)
downloaded 2.2 MB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/DBI_1.1.0.zip'
Content type 'application/zip' length 607261 bytes (593 KB)
downloaded 593 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/chron_2.3-55.zip'
Content type 'application/zip' length 203176 bytes (198 KB)
downloaded 198 KB
试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/sqldf_0.4-11.zip'
Content type 'application/zip' length 78408 bytes (76 KB)
downloaded 76 KB
package ‘ellipsis’ successfully unpacked and MD5 sums checked
package ‘glue’ successfully unpacked and MD5 sums checked
package ‘bit’ successfully unpacked and MD5 sums checked
package ‘rlang’ successfully unpacked and MD5 sums checked
package ‘vctrs’ successfully unpacked and MD5 sums checked
package ‘digest’ successfully unpacked and MD5 sums checked
package ‘bit64’ successfully unpacked and MD5 sums checked
package ‘blob’ successfully unpacked and MD5 sums checked
package ‘memoise’ successfully unpacked and MD5 sums checked
package ‘pkgconfig’ successfully unpacked and MD5 sums checked
package ‘Rcpp’ successfully unpacked and MD5 sums checked
package ‘BH’ successfully unpacked and MD5 sums checked
package ‘plogr’ successfully unpacked and MD5 sums checked
package ‘gsubfn’ successfully unpacked and MD5 sums checked
package ‘proto’ successfully unpacked and MD5 sums checked
package ‘RSQLite’ successfully unpacked and MD5 sums checked
package ‘DBI’ successfully unpacked and MD5 sums checked
package ‘chron’ successfully unpacked and MD5 sums checked
package ‘sqldf’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\Admin\AppData\Local\Temp\RtmpUHJCna\downloaded_packages
library(sqldf)
name <- c(rep("张三", 1, 3), rep("李四", 3))
subject <- c("数学","语文","英语","数学","语文","英语")
score <- c(89, 80, 70, 90, 70, 80)
stuid <- c(1, 1, 1, 2, 2, 2)
stuscore <- data.frame(name, subject, score, stuid)
stuscore
运行结果:
name subject score stuid
1 张三 数学 89 1
2 张三 语文 80 1
3 张三 英语 70 1
4 李四 数学 90 2
5 李四 语文 70 2
6 李四 英语 80 2
sqldf("select name, sum(score) as allscore from stuscore group by name order by allscore")
运行结果:
name allscore
1 张三 239
2 李四 240
sqldf("select name, stuid, sum(score) as allscore from stuscore group by name order by allscore")
运行结果:
name stuid allscore
1 张三 1 239
2 李四 2 240
sqldf("select stuid, name, subject, max(score) as maxscore from stuscore group by stuid order by maxscore")
运行结果:
stuid name subject maxscore
1 1 张三 数学 89
2 2 李四 数学 90
sqldf("select stuid, name, subject, avg(score) as avgscore from stuscore group by stuid order by avgscore")
运行结果:
stuid name subject avgscore
1 1 张三 数学 79.66667
2 2 李四 数学 80.00000
汇总统计数据
数据汇总统计通过aggregate()实现
它首先将数据进行分组(按行),然后对每一组数据进行函数统计,最后把结果组合成一个表格返回
aggregate(x,by,FUN)
其中:
x是待统计的数据对象
by是一个变量名组成的列表,这些变量将被去掉以形成新的观测
FUN是用来计算描述统计量的标量函数,它将被用来计算新的观测值
示例:
score <- data.frame(ID = c(101, 102, 103, 104, 105, 106, 107, 108, 109, 110),
score1 = c(92, 86, 85, 74, 82, 88, 96, 91, 84, 72),
score2 = c(73, 69, 82, 93, 80, 94, 71, 87, 86, 91),
gender = c("male", "male", "female", "female", "female", "female", "female", "male", "male", "male"))
score
aggregate(score[,c(2,3)],by=list(score[,4]),FUN=mean)
mtcars
colnames(mtcars)
mtcars$cyl
attach(mtcars)
aggregate(mtcars[,c(1,3)],by=list(cyl,gear),FUN=mean)
重塑数据
重塑数据可以通过merge函数与melt函数实现。其中,merge函数可以横向合并两个数据框(数据集),melt函数可以实现数据整合的功能
merge函数
粘贴数据结构——R中合并两个数据集可以通过专门的函数merge( )来实现
merge通过相同的列或行名来识别,合并两个数据框或列表,其调用格式如下:
merge(x,y,by = intersect(names(x),names(y)),
by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE,
incomparables = NULL, …)
以上是关于R语言 整合数据的主要内容,如果未能解决你的问题,请参考以下文章