R基础--快速探索数据(《R可视化》)

Posted lxiaoq

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R基础--快速探索数据(《R可视化》)相关的知识,希望对你有一定的参考价值。

 

 

 

---恢复内容开始---

安装软件包并加载

> install.packages(‘gcookbook‘)

>install.packages(‘ggplot2‘)

> library(ggplot2)

或者:install.packages(‘ggplot2‘,‘gcokbook‘)

> library(gcookbook)

设置工作路径:

> setwd("d:/data")
> getwd()

[1] "d:/data"

读取文件

> data <- read.csv(‘dadtafile.csv‘)

> data
X1 X3.8.1941 X1.1.2007
1 2 1/24/1972 1/1/2007
2 3 6/1/1932 1/1/2007
3 4 5/17/1947 1/1/2007
4 5 3/10/1943 1/1/2007
5 6 1/8/1940 1/1/2007
6 7 8/5/1947 1/1/2007
7 8 4/14/2005 1/1/2007
8 9 6/23/1961 1/2/2007
9 10 1/10/1949 1/2/2007

 

> data <- read.csv(‘dadtafile.csv‘,header=FALSE)

> data
V1 V2 V3
1 1 3/8/1941 1/1/2007
2 2 1/24/1972 1/1/2007
3 3 6/1/1932 1/1/2007
4 4 5/17/1947 1/1/2007
5 5 3/10/1943 1/1/2007
6 6 1/8/1940 1/1/2007
7 7 8/5/1947 1/1/2007
8 8 4/14/2005 1/1/2007
9 9 6/23/1961 1/2/2007
10 10 1/10/1949 1/2/2007

> head(data)
V1 V2 V3
1 1 3/8/1941 1/1/2007
2 2 1/24/1972 1/1/2007
3 3 6/1/1932 1/1/2007
4 4 5/17/1947 1/1/2007
5 5 3/10/1943 1/1/2007
6 6 1/8/1940 1/1/2007

 

> names(data) <- c(‘Column1‘,‘Column1‘,‘Column1‘)
> data
Column1 Column1 Column1
1 1 3/8/1941 1/1/2007
2 2 1/24/1972 1/1/2007
3 3 6/1/1932 1/1/2007
4 4 5/17/1947 1/1/2007
5 5 3/10/1943 1/1/2007
6 6 1/8/1940 1/1/2007
7 7 8/5/1947 1/1/2007
8 8 4/14/2005 1/1/2007
9 9 6/23/1961 1/2/2007
10 10 1/10/1949 1/2/2007
>

sep参数设置分隔符号:

> data <- read.csv(‘datafile.csv‘,sep=‘ ‘)

#制表符 使用 

 

读入的字符型比如:china会被识别为因子,所以stringsAsFactors=FALSE

即:> data <- read.csv(‘dadtafile.csv‘)

> data$Sex <- factor(data$Sex)

#转换为因子

> str(data)
‘data.frame‘: 10 obs. of 3 variables:
$ Column1: int 1 2 3 4 5 6 7 8 9 10
$ Column1: Factor w/ 10 levels "1/10/1949","1/24/1972",..: 5 2 8 7 4 3 10 6 9 1
$ Column1: Factor w/ 2 levels "1/1/2007","1/2/2007": 1 1 1 1 1 1 1 1 2 2

 

 

 

 

 

---恢复内容结束---

---恢复内容开始---

安装软件包并加载

> install.packages(‘gcookbook‘)

>install.packages(‘ggplot2‘)

> library(ggplot2)

或者:install.packages(‘ggplot2‘,‘gcokbook‘)

> library(gcookbook)

设置工作路径:

> setwd("d:/data")
> getwd()

[1] "d:/data"

读取文件

> data <- read.csv(‘dadtafile.csv‘)

> data
X1 X3.8.1941 X1.1.2007
1 2 1/24/1972 1/1/2007
2 3 6/1/1932 1/1/2007
3 4 5/17/1947 1/1/2007
4 5 3/10/1943 1/1/2007
5 6 1/8/1940 1/1/2007
6 7 8/5/1947 1/1/2007
7 8 4/14/2005 1/1/2007
8 9 6/23/1961 1/2/2007
9 10 1/10/1949 1/2/2007

 

> data <- read.csv(‘dadtafile.csv‘,header=FALSE)

> data
V1 V2 V3
1 1 3/8/1941 1/1/2007
2 2 1/24/1972 1/1/2007
3 3 6/1/1932 1/1/2007
4 4 5/17/1947 1/1/2007
5 5 3/10/1943 1/1/2007
6 6 1/8/1940 1/1/2007
7 7 8/5/1947 1/1/2007
8 8 4/14/2005 1/1/2007
9 9 6/23/1961 1/2/2007
10 10 1/10/1949 1/2/2007

> head(data)
V1 V2 V3
1 1 3/8/1941 1/1/2007
2 2 1/24/1972 1/1/2007
3 3 6/1/1932 1/1/2007
4 4 5/17/1947 1/1/2007
5 5 3/10/1943 1/1/2007
6 6 1/8/1940 1/1/2007

 

> names(data) <- c(‘Column1‘,‘Column1‘,‘Column1‘)
> data
Column1 Column1 Column1
1 1 3/8/1941 1/1/2007
2 2 1/24/1972 1/1/2007
3 3 6/1/1932 1/1/2007
4 4 5/17/1947 1/1/2007
5 5 3/10/1943 1/1/2007
6 6 1/8/1940 1/1/2007
7 7 8/5/1947 1/1/2007
8 8 4/14/2005 1/1/2007
9 9 6/23/1961 1/2/2007
10 10 1/10/1949 1/2/2007
>

sep参数设置分隔符号:

> data <- read.csv(‘datafile.csv‘,sep=‘ ‘)

#制表符 使用 

 

读入的字符型比如:china会被识别为因子,所以stringsAsFactors=FALSE

即:> data <- read.csv(‘dadtafile.csv‘)

> data$Sex <- factor(data$Sex)

#转换为因子

> str(data)
‘data.frame‘: 10 obs. of 3 variables:
$ Column1: int 1 2 3 4 5 6 7 8 9 10
$ Column1: Factor w/ 10 levels "1/10/1949","1/24/1972",..: 5 2 8 7 4 3 10 6 9 1
$ Column1: Factor w/ 2 levels "1/1/2007","1/2/2007": 1 1 1 1 1 1 1 1 2 2

 

 

read.table()

技术分享图片

 

read.xlsx()

 

install.packages(xlsx)

library(xlsx)

library()若不成功,则参考:https://wenku.baidu.com/view/1bc6610ece2f0066f433229f.html

下载java:https://www.java.com/en/download/manual.jsp

 

 

 

> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
>

> plot(wt,mpg)
Error in plot(wt, mpg) : object ‘wt‘ not found
> attach(mtcars)
> plot(wt,mpg)
> detach(mtcars)
>#par函数可实现一页多图,颜色。粗细等...

 

 

技术分享图片

 ggplot2中:

> library(ggplot2)
> attach(mtcars)

> qplot(wt,mpg)
> detach(mtcars)

等价于:

> qplot(wt,mpg,data=mtcars)

等价于:
> ggplot(mtcars,aes(x=wt,y=mpg))+geom_point()

 

技术分享图片

 

> ?plot

查看帮助

Usage

plot(x, y, ...)
Arguments

x
the coordinates of points in the plot. Alternatively, a single plotting structure, function or any R object with a plot method can be provided.
y
the y coordinates of points in the plot, optional if x is an appropriate structure.
...
Arguments to be passed to methods, such as graphical parameters (see par). Many methods will accept the following arguments:
type
what type of plot should be drawn. Possible types are
"p" for points,
"l" for lines,
"b" for both,
"c" for the lines part alone of "b",
"o" for both ‘overplotted’,
"h" for ‘histogram’ like (or ‘high-density’) vertical lines,
"s" for stair steps,
"S" for other steps, see ‘Details’ below,
"n" for no plotting.
All other types give a warning or an error; using, e.g., type = "punkte" being equivalent to type = "p" for S compatibility. Note that some methods, e.g. plot.factor, do not accept this.
main
an overall title for the plot: see title.
sub
a sub title for the plot: see title.
xlab
a title for the x axis: see title.
ylab
a title for the y axis: see title.
asp
the y/x aspect ratio, see plot.window.

 

 

> plot(pressure$temperature,pressure$pressure)

技术分享图片

> plot(pressure$temperature,pressure$pressure,type="l")
>

 

技术分享图片

 

> points(pressure$temperature,pressure$pressure)
>

技术分享图片

> lines(pressure$temperature,pressure$pressure/2,col="red")
> points(pressure$temperature,pressure$pressure/2,col="red")
>

技术分享图片

> qplot(pressure$temperature,pressure$pressure,geom="line")
> qplot(temperature,pressure,data=pressure,geom=c("line","point"))
等价于:
> ggplot(pressure,aes(x=temperature,y=pressure))+geom_line()+geom_point()

 

 技术分享图片

 

> data <- mtcars
> table(cyl)
cyl
4 6 8
11 7 14

 

#计算频数

> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

 

 

> barplot(BOD$demand,names.arg=BOD$Time)

技术分享图片

> barplot(table(cyl))

技术分享图片

> qplot(BOD$Time,BOD$demand,geom="bar",stat="identity")

> qplot(factor(BOD$Time),BOD$demand,geom="bar",stat="identity")

绘制不了!

Error: stat_count() must not be used with a y aesthetic.
In addition: Warning message:
`stat` is deprecated

 



> qplot(mtcars$cyl)

> qplot(factor(mtcars$cyl))

技术分享图片

技术分享图片

 

 > ggplot(BOD,aes(Time,demand))+geom_bar(stat = ‘identity‘)

技术分享图片

> ggplot(BOD,aes(x=factor(Time),y=demand))+geom_bar(stat="identity")
>

技术分享图片

 

 

 https://www.cnblogs.com/lizhilei-123/p/6722116.html

ggplot2之快速作图qplot()----颜色。透明度,形状

 

频数条形图:

> library(ggplot2)
> ggplot(mtcars,aes(x=factor(cyl)))+geom_bar()

等价:
> qplot(factor(cyl),data=mtcars)

技术分享图片

条形图与直方图看起来相似,但是却是不一样的,条形图的x轴是一个确定的数值,而直方图是一个区间。

 

> hist(mtcars$mpg,breaks=10)

#breaks=10  组距为10

技术分享图片

 

 > qplot(mpg,data = mtcars,binwidth=4)

或者

> ggplot(mtcars,aes(x=mpg))+geom_histogram(binwidth = 4)
>binwidth参数是调节横坐标的区间,你可以任意调节你认为合适的区间。

技术分享图片

plot(ToothGrowth$supp, ToothGrowth$len)

技术分享图片

boxplot(len~supp,data = ToothGrowth)

技术分享图片

qplot(supp,len,data=ToothGrowth,geom = ‘boxplot‘)

 

ggplot(ToothGrowth,aes(supp,len))+geom_boxplot()
技术分享图片

 

 

 

 

 

 

 

 

 

 

 

 r code execution error???

重装吧。。。。。。。。。。

系统C盘空间容量不够。

 








































































































































































以上是关于R基础--快速探索数据(《R可视化》)的主要内容,如果未能解决你的问题,请参考以下文章

R语言基础

R语言可视化探索BRFSS数据并逻辑回归Logistic回归预测中风|附代码数据

R语言使用ggplot2包的快速可视化函数qplot绘制基础密度图实战

R语言使用ggplot2包的快速可视化函数qplot绘制基础直方图实战

R语言随机森林RandomForest逻辑回归Logisitc预测心脏病数据和可视化分析|附代码数据

R语言数据可视化分析案例:探索BRFSS数据