R语言 多元线性回归 研究年龄身高体重的关系

Posted 基督徒Isaac

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R语言 多元线性回归 研究年龄身高体重的关系相关的知识,希望对你有一定的参考价值。

  • 先上代码
data <- read.table('e://kg.txt',
                   header = TRUE,
                   sep = '\\t')
data <- data %>% as_tibble()
data %>% attach()
data %>% ggplot(aes(cm, kg))+ geom_line()
data %>% ggplot(aes(age,cm))+ geom_line()
data %>% ggplot(aes(age,kg))+ geom_line()

# age 与 height 与weight 关系:
data[1:3] %>% cor() %>% corrplot::corrplot(method = "color",
                                           addCoef.col = "grey")
lm_data <- data %>% lm(kg~I(cm^3),.)
lm_data %>% summary()
lm_data
plot(cm^3,kg,xaxt='n');
axis(1,at=cm^3,labels=cm);
abline(lm_data)

  • 继续更新
# https://zhuanlan.zhihu.com/p/94372177
# https://www.jianshu.com/p/a081a791ae03
# https://cloud.tencent.com/developer/article/1674211
# https://www3.nd.edu/~steve/computing_with_data/2_Motivation/motivate_ht_wt.html?spm=a2c4e.11153940.blogcont603256.20.333b1d6fYOsiOK
# 载入数据,数据集在这里下载:https://github.com/johnmyleswhite/ML_for_Hackers/blob/master/02-Exploration/data/01_heights_weights_genders.csv
library(tidyverse)
ht_weight_df <- read.table("e://01_heights_weights_genders.txt",
                           header = TRUE,
                           sep = "\\t") %>% 
  as_tibble()
ht_weight_df %>% mice::md.pattern()

# 绘图查看相关性
ht_weight_df %>% select(-1) %>% 
  cor() %>% corrplot::corrplot(method = "color",
                               addCoef.col = "grey")
ht_weight_df %>% select(-1) %>% sample_frac(0.1) %>% 
  plot(cex = 0.1)

# 拟合检验线性相关
lm_ht_weight <- lm(Weight ~ Height, data = ht_weight_df)
lm_ht_weight %>% summary()
lm_ht_weight %>% abline()

# 分性别对照
ht_weight_df %>% group_by(Gender) %>% 
  dplyr::summarise( round( mean( Height)* 2.54))
  # subset(Gender == )也可选取组
  # fivenum() 不能[2]、select(2)
  # sapply()不能$变量、select(2)
  # psych::describe() 不能[2]
  # pastecs::stat.desc()、Hmisc::describe()、summary() 都可以
  # plyr::ddply(.(Gender), function(df) summary(df$Height))从原数据分组求值

# 查看分布
par(mfrow = c(1,1))
ht_weight_df %>% subset(Gender == "Male") %>% select(Height) %>% 
  unlist() %>% as.numeric() %>% 
  density() %>% plot(type = "h", col = 4, ann = FALSE) #  main被屏蔽
ht_weight_df %>% subset(Gender == "Female") %>% select(Height) %>% 
  unlist() %>% as.numeric() %>% 
  density() %>% lines(col = 2)
title(main = "Height By Gender")
abline(col = c(1, 2),
       lty = 3,
       v = c(
         mean(ht_weight_df %>% subset(Gender == "Male") %>% 
                select(Height) %>% unlist()),
         mean(ht_weight_df %>% subset(Gender == "Female") %>% 
                select(Height) %>% unlist())
         ))
ht_weight_df %>% ggplot(aes(x = Height, colour = Gender)) + 
  geom_density()

以上是关于R语言 多元线性回归 研究年龄身高体重的关系的主要内容,如果未能解决你的问题,请参考以下文章

R语言-回归

R语言解读一元线性回归模型

R语言构建多元线性回归模型

梯度下降

机器学习666

R语言lm函数拟合多元线性回归模型(无交互作用)并诊断模型diagnostics使用plot函数打印回归模型的Q-Q图残差拟合图标度-位置图残差与杠杆关系图