Relative Weights: 计算每个解释变量对R方的贡献

2019-10-01  本文已影响0人  别花春水

question:

R语言程序示例

ref: R in Action, 2nd edition, page:209
ps:亲测该版本的R in Action有印刷错误,而且部分宏包更新后代码失效,但是它是freedownload,我懒得找其他版本了,慎重参阅。
relative weights :find contribution each predictor makes to R-square

relweights <- function(fit,...){
  R <- cor(fit$model) #相关系数矩阵Rxx
  nvar <- ncol(R)
  rxx <- R[2:nvar, 2:nvar] #自变量的相关系数矩阵
  rxy <- R[2:nvar, 1] 
  svd <- eigen(rxx) #计算矩阵特征值、特征向量
  evec <- svd$vectors  #特征向量
  ev <- svd$values #特征值
  delta <- diag(sqrt(ev))  #以特征值的平方根为对角线创建矩阵delta
  lambda <- evec %*% delta %*% t(evec)    # correlations between original predictors and new orthogonal variables。转化为对角化矩阵
  lambdasq <- lambda ^ 2  
  beta <- solve(lambda) %*% rxy   # regression coefficients of Y on orthogonal variables正交矩阵.$AA=I$则A是正交矩阵。求lambda的逆矩阵,再乘rxy
  rsquare <- colSums(beta ^ 2) #R^2是模型对总体的解释力度
  rawwgt <- lambdasq %*% beta ^ 2  #自变量单独对总体的解释力度。
  import <- (rawwgt / rsquare) * 100
  import <- as.data.frame(import)
  row.names(import) <- names(fit$model[2:nvar])
  names(import) <- "Weights" #设定列名
  import <- import[order(import),1, drop=FALSE]
  dotchart(import$Weights, labels=row.names(import),
    xlab="% of R-Square", pch=19,
    main="Relative Importance of Predictor Variables",
    sub=paste("Total R-Square=", round(rsquare, digits=3)),  #点线
    ...)
return(import)
}

states <- as.data.frame(state.x77[,c("Murder", "Population",
"Illiteracy", "Income", "Frost")])
fit <- lm(Murder ~ Population + Illiteracy + Income + Frost, data=states)
relweights(fit, col="blue")

基础知识:矩阵形式ols的求解

已知X\beta=Y,求\beta :
\because X^TX\beta=X^TY
\therefore \beta=(X^TX)^{-1}X^TY

calculation of relative weight

ref:Jeff Johnson,2000. A Heuristic Method for Estimating the Relative Weight of Predictor Variables in Multiple Regression. Multivariate Behavioral Research, 35:1-19

每个变量的贡献包括单独贡献以及包含与其他变量的correlation的贡献。

主要动作是将原自变量矩阵转化为不互相关的正交矩阵,再obtaining the bestfitting (in the least squares sense) set of orthogonal variables(正交矩阵)

  1. 先求矩阵X{'}X的eigenvectors和eigenvalues
  2. 再求X的singular value decomposition (奇异值分解),求类似主成分分析(PCA)那样的退化矩阵
    X=P\Delta Q{'} \\ \Delta=\sqrt{eigenvalues}
    这里P和Q都是eigenvectors
    ps:If no two predictor variables in X are perfectly correlated with each other, X is of full rank and no diagonal elements of \Delta will be equal to zero.
  3. 找到与X最接近的正交矩阵Z,因为The columns of Z are the best-fitting approximations to the columns of X in that they minimize the residual sum of squares between the original variables and the orthogonal variables (Johnson, 1966)
    Z=PQ{'}
  4. 让X在Z上回归,X=\Lambda Z,因此

\Lambda=(Z'Z)^{-1}Z'X=Q\Delta Q' \tag{1}

X is a linear transformation of Z
Because the Z variables are uncorrelated, the relative contribution of each z to each x is represented by the squared standardized regression coefficient (which is the same as the squared zero-order correlation) of each z for each x, represented by the squared column elements of \Lambda (\lambda_{jk^2})

由于Z'X=X'Z,因此any particular \lambda_{jk^2} represents the proportion of variance in z_k accounted for by x_j, just as it represents the proportion of variance in x_j accounted for by z_k.

  1. 找到Y被Z解释的部分\beta。The vector of beta weights when regressing y on Z is obtained by

\beta=(Z'Z)^{-1}Z'y=QP'y

  1. 求relative weights
    转化为方差的平方单位,再将其scaled by R^2
    \varepsilon=\frac{\Lambda^2*\beta^2}{\sum{\beta^2_i}} \tag{2}
    其中,Z是X的近似替代,所以模型解释力度R方不应变化,因而R^2=\sum{\beta^2_i}

  2. 因此在计算relative weights的时候我们有X的correlation matrix等于
    X=P\Delta Q' \\ R_{XX}=X'X=Q\Delta P'P\Delta Q'=Q\Delta^2Q'
    Q正好是R_{XX}的eigenvalues
    由(1)知
    R^{1/2}_{XX}=Q\Delta Q'=\Lambda \tag{3}
    \therefore R_{XZ}=X'Z=Q\Delta P'P\Delta Q'=Q\Delta Q'=R^{1/2}_{XX}
    由于R_{XZ}R_{YZ}=R_{XY},即R_{XZ}\beta=R_{XY},因此
    \beta=R^{-1}_{XZ}R_{XY}=\Lambda^{-1}R_{XY} \tag{4}

按照(3)(4)算出\Lambda\beta代入(2)即可.

\LaTeX 数学符号参考:

http://www.mohu.org/info/symbols/symbols.htm
在简书,所有行内的\LaTeX符号都有过度的上浮,打了\tag的公式字体大小也不太好,是怎么回事?

上一篇 下一篇

猜你喜欢

热点阅读