Multivariate linear regression model is usually used to study the relationship between a dependent variable and multiple independent variables , If the relationship between them can be described in linear form , A multivariate linear model can be established for analysis .
1. Introduction to the model
1.1 The structure of the model
Multiple linear regression models are usually used to describe variables y and x The random linear relationship between them , Namely ：
If yes y and x Yes x Observations , obtain n Group observation value yi,x1i,…,xki(i=1,2,…,n), They satisfy the relationship ：
1.2 Test of model parameters
Under the normal assumption , If X It's full rank , Then the least square estimation of the parameters of the ordinary linear regression model is ：
therefore y The estimated value of is ：
（1） Significance test of regression equation
（2） Significance test of regression coefficient
2. Modeling steps
（1） The regression model was established according to the data
（2） The model was tested for significance
（3) Regression diagnosis was performed on the model
a lm.salary=lm( Fuxian ~x1+x2+x3+x4,data=a) summary(lm.salary) # notes ： It's just that y The result of garbled code
find x2,x3,x4 The coefficient is not significant .
（2） Selecting variables
If you remove the variable x2,AIC The value of is 648.49, If you remove the variable x3,AIC The value of is 650.85, If you remove the variable x1,AIC The value of is 715.19, So remove it here x2.
Carry out the next round of calculation ：
lm.salary=lm( Fuxian ~x1+x3+x4,data=a) lm.step=step(lm.salary,direction="both")
Find out x3,AIC The value of is 647.64, So remove it x3.
Alone x1 and x4, Fit .
lm.salary=lm( Fuxian ~x1+x4,data=a) summary(lm.salary)
It can be seen that F test P Value less than 0.05 remarkable , Each parameter coefficient is also significant .
（3） The regression residuals of the above regression models were diagnosed
Calculate the standardized residual of the model
library(TSA) y.rst=rstandard(lm.step) y.rst
Draw the residual scatter plot ：
It's obvious 4 and 35 Abnormal signal point , Remove these two points .
lm.salary=lm(log( Fuxian )~x1+x2+x3+x4,data=a[-c(4,35),])
The result after removing two points ：
Draw model diagnosis diagram ：
par(mfrow=c(2,2)) plot(lm.step) influence.measures(lm.step)
The residual fitting diagram basically presents a random distribution pattern , Normal Q-Q The graph basically falls on a straight line , It shows that the residuals obey normal distribution ; size - Location map and residuals - The leverage diagram exists as a group and is not far from the center . This shows that 3,4,35 The observation value of No.1 may be abnormal point and strong influence point .