2019-02-02 R语言主成分学习
https://www.jianshu.com/p/6e413420407a 这篇文章
- 安装包
install.packages(c("FactoMineR", "factoextra","corrplot"))
- 看一下数据情况. 数据是内置的
> head(decathlon2)
X100m Long.jump Shot.put High.jump X400m X110m.hurdle Discus
SEBRLE 11.04 7.58 14.83 2.07 49.81 14.69 43.75
CLAY 10.76 7.40 14.26 1.86 49.37 14.05 50.72
BERNARD 11.02 7.23 14.25 1.92 48.93 14.99 40.87
YURKOV 11.34 7.09 15.19 2.10 50.42 15.31 46.26
ZSIVOCZKY 11.13 7.30 13.48 2.01 48.62 14.17 45.67
McMULLEN 10.83 7.31 13.76 2.13 49.91 14.38 44.41
Pole.vault Javeline X1500m Rank Points Competition
SEBRLE 5.02 63.19 291.7 1 8217 Decastar
CLAY 4.92 60.15 301.5 2 8122 Decastar
BERNARD 5.32 62.77 280.1 4 8067 Decastar
YURKOV 4.72 63.44 276.4 5 8036 Decastar
ZSIVOCZKY 4.42 55.37 268.0 7 8004 Decastar
McMULLEN 4.42 56.37 285.1 8 7995 Decatur
- 获取需要的数据,并查看
> decathlon2.active <- decathlon2[1:23, 1:10]
> head(decathlon2.active)
X100m Long.jump Shot.put High.jump X400m X110m.hurdle Discus
SEBRLE 11.04 7.58 14.83 2.07 49.81 14.69 43.75
CLAY 10.76 7.40 14.26 1.86 49.37 14.05 50.72
BERNARD 11.02 7.23 14.25 1.92 48.93 14.99 40.87
YURKOV 11.34 7.09 15.19 2.10 50.42 15.31 46.26
ZSIVOCZKY 11.13 7.30 13.48 2.01 48.62 14.17 45.67
McMULLEN 10.83 7.31 13.76 2.13 49.91 14.38 44.41
Pole.vault Javeline X1500m
SEBRLE 5.02 63.19 291.7
CLAY 4.92 60.15 301.5
BERNARD 5.32 62.77 280.1
YURKOV 4.72 63.44 276.4
ZSIVOCZKY 4.42 55.37 268.0
McMULLEN 4.42 56.37 285.1

- 做PCA分析 使用自带标准化函数
res.pca <- PCA(X = decathlon2.active, scale.unit =
TRUE, ncp = 10, graph = T)
参数: X 为输入的数据集、scale.unit为 是否要标准化、ncp= 最后保留几个主成分、graph 要不要看图

- 看一下给了哪些结果
> print(res.pca)
**Results for the Principal Component Analysis (PCA)**
The analysis was performed on 23 individuals, described by 10 variables
*The results are available in the following objects:
name description
1 "$eig" "eigenvalues"
2 "$var" "results for the variables"
3 "$var$coord" "coord. for the variables"
4 "$var$cor" "correlations variables - dimensions"
5 "$var$cos2" "cos2 for the variables"
6 "$var$contrib" "contributions of the variables"
7 "$ind" "results for the individuals"
8 "$ind$coord" "coord. for the individuals"
9 "$ind$cos2" "cos2 for the individuals"
10 "$ind$contrib" "contributions of the individuals"
11 "$call" "summary statistics"
12 "$call$centre" "mean of the variables"
13 "$call$ecart.type" "standard error of the variables"
14 "$call$row.w" "weights for the individuals"
15 "$call$col.w" "weights for the variables"
> res.pca$eig
eigenvalue percentage of variance
comp 1 4.1242133 41.242133
comp 2 1.8385309 18.385309
comp 3 1.2391403 12.391403
comp 4 0.8194402 8.194402
comp 5 0.7015528 7.015528
comp 6 0.4228828 4.228828
comp 7 0.3025817 3.025817
comp 8 0.2744700 2.744700
comp 9 0.1552169 1.552169
comp 10 0.1219710 1.219710
cumulative percentage of variance
comp 1 41.24213
comp 2 59.62744
comp 3 72.01885
comp 4 80.21325
comp 5 87.22878
comp 6 91.45760
comp 7 94.48342
comp 8 97.22812
comp 9 98.78029
comp 10 100.00000
> eig.val <- get_eigenvalue(res.pca)
> eig.val
eigenvalue variance.percent cumulative.variance.percent
Dim.1 4.1242133 41.242133 41.24213
Dim.2 1.8385309 18.385309 59.62744
Dim.3 1.2391403 12.391403 72.01885
Dim.4 0.8194402 8.194402 80.21325
Dim.5 0.7015528 7.015528 87.22878
Dim.6 0.4228828 4.228828 91.45760
Dim.7 0.3025817 3.025817 94.48342
Dim.8 0.2744700 2.744700 97.22812
Dim.9 0.1552169 1.552169 98.78029
Dim.10 0.1219710 1.219710 100.00000
fviz_eig(res.pca, addlabels = TRUE, ylim = c(0, 50))

Visualize the results individuals.

Visualize the results variables.

> var$cos2
Dim.1 Dim.2 Dim.3 Dim.4
X100m 7.235641e-01 0.0321836641 0.090936280 0.0011271597
Long.jump 6.307229e-01 0.0788806285 0.036307981 0.0133147506
Shot.put 5.386279e-01 0.0072938636 0.267907488 0.0165041211
High.jump 3.722025e-01 0.2164242070 0.108956221 0.0208947375
X400m 4.922473e-01 0.0842034209 0.080390914 0.1856106269
X110m.hurdle 5.838873e-01 0.0006121077 0.201499837 0.0002854712
Discus 5.523596e-01 0.0024662013 0.031161138 0.1560322304
Pole.vault 4.720540e-02 0.6519772763 0.008846856 0.1149106765
Javeline 1.833781e-01 0.1490803723 0.364966189 0.1100478063
X1500m 1.830545e-05 0.6154091638 0.048167378 0.2007126089
X100m 0.03780845
Long.jump 0.05436203
Shot.put 0.06190783
High.jump 0.16216747
X400m 0.01079698
X110m.hurdle 0.05027463
Discus 0.16665918
Pole.vault 0.04914437
Javeline 0.03912992
X1500m 0.06930197
corrplot(var$cos2, is.corr=FALSE)

corrplot(var$contrib, is.corr=FALSE)

fviz_contrib(res.pca, choice = "var", axes = 1:3, top = 5)

再补充一个 correspondence analysis 对应分析
The data used here is a contingency table that summarizes the answers given by different categories of people to the following question: “according to you, what are the reasons that can make hesitate a woman or a couple to have children?” The data frame is made of 18 rows and 8 columns. Rows represent the different reasons mentioned, columns represent the different categories (education, age) people belong to.
18行 8列
行代表原因 列代表不同的问的人的属性
先来看数据集 也是内置的

res.ca <- CA(children, col.sup = 6:8, row.sup = 15:18)

plot(res.ca, invisible = c("row.sup", "col.sup"))
