linear regression analysis-chapt

2016-11-12  本文已影响56人  暗黑破坏球嘿哈

线性系统分析,看书和coursera课上的笔记

  1. multiple regression model: a regression model that involves more than one regressor variable(x).
  2. regression coefficients, β0...βj
  3. 多项式的都可以被化为多元linear
  4. the effect produced by changing one variable depends on the level of the other variable
  5. in the real world problenms, regression coefficients and error variance are not known, and must be estimated from sample data. the fitted regression equation or model is typically used in prediction.
  6. all result should be valid for the case where the regressors are random variables. When the Xs are random variables, it is only necessary that the observations on each regressor be independent and the distribution not depend on the regression coefficients(βs) or on σ.
  7. when testing hypotheses or constructing CIs(Confidence interval), we should assume that the conditional distribution(条件分布) of y given x be normal with mean β0+β1x1+β2x2+...+βkxk and variance σ^2
    大概意思就是做hypothesis test的时候和计算置信区间的时候,要assume,这个y=方程是符合关于mean 和variance的正态分布的
    (若随机变量X服从一个数学期望为μ、方差为σ2的正态分布,记为N(μ,σ2),当μ = 0,σ = 1时的正态分布是标准正态分布,前面说过E(ε)=0, 所以这里应该是标准正态分布)

coursera jh regression models

useful

  1. explanatory variable---- independent variable (predictor, x)
    response variable----dependent variable (predicted, y).
    y=β0+β1x, where β0 is the intercept, and β1 is the slope.

  2. Define correlation as the linear association between two numerical variables. 线性模型和x y变量的关系是否紧密,R表示
    Note that a relationship that is nonlinear is simply called an association

  3. Correlation properties:

  4. the magnitude (absolute value) of the correlation coefficient measures the strength of the linear association between two numerical variables

  5. the sign of the correlation coefficient indicates the direction of association

  6. the correlation coefficient is always between -1 and 1, -1 indicating perfect negative linear association, +1 indicating perfect positive linear association, and 0 indicating no linear relationship

  7. the correlation coefficient is unitless
    since the correlation coefficient is unitless, it is not affected by changes in the center or scale of either variable (such as unit conversions)

  8. the correlation of X with Y is the same as of Y with X
    the correlation coefficient is sensitive to outliers

  9. Define residual (e) as the difference between the observed (y) and predicted (y^) values of the response variable.

  10. Define the least squares line as the line that minimizes the sum of the squared residuals, and list conditions necessary for fitting such line:

  11. Note that the least squares line always passes through the average of the response and explanatory variables (x¯,y¯).

  12. Use the above property to calculate the estimate for the intercept (b0) as
    b0=y¯−b1x¯,

where b1 is the slope, y¯ is the average of the response variable, and x¯ is the average of explanatory variable.

  1. Predict the value of the response variable for a given value of the explanatory variable, x⋆, by plugging in x⋆ in the linear model:
    y^=b0+b1x⋆

β0+β1x1+β2x2+...+βkxk
σ
ε
Σ
α
λ
δ

一点补充:t 检验是根据两样本均数及两样本的标准差,计算如果两总体均数相同的话,抽样得到两样本均数差达如此之大或更大的可能性多大,就是p 值,p值<0.05,表示两者之间的距离显著。

cheatSheet
when testing hypotheses or constructing CIs(Confidence interval), we should assume that the conditional distribution(条件分布) of y given x be normal with mean β0+β1x1+β2x2+...+βkxk
and variance σ^2

上一篇下一篇

猜你喜欢

热点阅读