- 2020-07-28 20:06
*views 3*- multivariate statistical analysis

Multivariate linear regression model is usually used to study the relationship between a dependent variable and multiple independent variables , If the relationship between them can be described in linear form , A multivariate linear model can be established for analysis .

1. Introduction to the model

1.1 The structure of the model

Multiple linear regression models are usually used to describe variables y and x The random linear relationship between them , Namely ：

If yes y and x Yes x Observations , obtain n Group observation value yi,x1i,…,xki(i=1,2,…,n), They satisfy the relationship ：

1.2 Test of model parameters

Under the normal assumption , If X It's full rank , Then the least square estimation of the parameters of the ordinary linear regression model is ：

therefore y The estimated value of is ：

（1） Significance test of regression equation

（2） Significance test of regression coefficient

2. Modeling steps

（1） The regression model was established according to the data

（2） The model was tested for significance

（3) Regression diagnosis was performed on the model

3. modeling

library(car) a=read.table("C:/Users/MrDavid/data_TS/reg.csv",sep=",",header=T)

a lm.salary=lm( Fuxian ~x1+x2+x3+x4,data=a) summary(lm.salary) # notes ： It's just that y The result of garbled code

find x2,x3,x4 The coefficient is not significant .

（2） Selecting variables

lm.step=step(lm.salary,direction="both")

If you remove the variable x2,AIC The value of is 648.49, If you remove the variable x3,AIC The value of is 650.85, If you remove the variable x1,AIC The value of is 715.19, So remove it here x2.

Carry out the next round of calculation ：

lm.salary=lm( Fuxian ~x1+x3+x4,data=a) lm.step=step(lm.salary,direction="both")

Find out x3,AIC The value of is 647.64, So remove it x3.

Alone x1 and x4, Fit .

lm.salary=lm( Fuxian ~x1+x4,data=a) summary(lm.salary)

It can be seen that F test P Value less than 0.05 remarkable , Each parameter coefficient is also significant .

（3） The regression residuals of the above regression models were diagnosed

Calculate the standardized residual of the model

library(TSA) y.rst=rstandard(lm.step) y.rst

Draw the residual scatter plot ：

It's obvious 4 and 35 Abnormal signal point , Remove these two points .

lm.salary=lm(log( Fuxian )~x1+x2+x3+x4,data=a[-c(4,35),])

lm.step=step(lm.salary,direction="both") y.rst=rstandard(lm.step)

y.fit=predict(lm.step) plot(y.rst~y.fit)

The result after removing two points ：

Draw model diagnosis diagram ：

par(mfrow=c(2,2)) plot(lm.step) influence.measures(lm.step)

The residual fitting diagram basically presents a random distribution pattern , Normal Q-Q The graph basically falls on a straight line , It shows that the residuals obey normal distribution ; size - Location map and residuals - The leverage diagram exists as a group and is not far from the center . This shows that 3,4,35 The observation value of No.1 may be abnormal point and strong influence point .

Technology

- Flow Chart77 blogs
- Java38 blogs
- Python32 blogs
- MySQL16 blogs
- Linux16 blogs
- Android15 blogs
- Administration13 blogs
- Database12 blogs
- more...

Daily Recommendation

©2020 ioDraw All rights reserved

error: (-215:Assertion failed) Solution python Simple record of network programming mac solve Enter passphrase for key Enter the password each time Huawei Hongmeng system learning notes 9- Ecological construction of developers Apple iPhone 12 Price leakage ： The official highest price is close to 1 Ten thousand yuan Docker Import of containers and mirrors , export Cross analysis of data analysis Notes on core principles of reverse engineering （ One ）——Hello World-1Python Student information management system ( Lite ) Ant old employees leave early and miss millions of wealth , Some people say they don't regret it