R语言学习笔记(7)-因子
2021-01-21 本文已影响0人
Akuooo
参考视频:https://www.bilibili.com/video/BV19x411X7C6?p=23
一、因子概念相关
-
R中变量分类
(1)名义型变量(如,城市名,省份,相互之间独立)
常见:字符串
(2)有序型变量(不同值之间有顺序关系,但又不是连续的,如good-better-best)
(3)连续型变量(如金额,人口等,可以为某个范围中的任意值)
常见:数值
变量分类-孟德尔豌豆.png -
概念
在R中名义型变量和有序型变量称为因子(factor),这些分类变量的可能值称为一个水平,level
如good-better-best,都称为一个level由这些水平值构成的向量就称为因子(因子本身就是向量)
-
因子的作用
可以用来记录某项研究中研究对象满足的不同处理水平,或者其他类型的分类变量。
应用:计算频数、独立性检验、相关性检验、方差分析、主成分分析、因子分析……
例如:
mtcars.png
> table(mtcars$cyl)//cyl这一列可作为因子类型,因子的level为4,6,8
4 6 8
11 7 14
二、定义因子
- factor()函数
> f <- factor(c("red","red","green","blue"))
> f
[1] red red green blue
Levels: blue green red
#指定因子水平
> week <- factor(c("Mon","Fri","Thu","Wed","Mon","Fri","Sun"), ordered = T, levels = c("Mon","Tue","Wed","Thu","Fri","Sat","Sun"))
> week
[1] Mon Fri Thu Wed Mon Fri Sun
Levels: Mon < Tue < Wed < Thu < Fri < Sat < Sun
#将向量直接转化为因子
> fcyl <- factor(mtcars$cyl)
> fcyl
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
Levels: 4 6 8
> plot(mtcars$cyl)
>plot(factor(mtcars$cyl))
mtcars$cyl.png
factorcyl.png
可以看到,向量绘图为散点图,而因子的则是柱状图
- cut()
1~100个数,按1-10,11-20……分类
> num <- 1:100
> cut (num,c(seq(0,100,10)))//计算每个区间包含的数字是多少,方便进行频数统计
[1] (0,10] (0,10] (0,10] (0,10] (0,10] (0,10] (0,10] (0,10] (0,10]
[10] (0,10] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20]
[19] (10,20] (10,20] (20,30] (20,30] (20,30] (20,30] (20,30] (20,30] (20,30]
[28] (20,30] (20,30] (20,30] (30,40] (30,40] (30,40] (30,40] (30,40] (30,40]
[37] (30,40] (30,40] (30,40] (30,40] (40,50] (40,50] (40,50] (40,50] (40,50]
[46] (40,50] (40,50] (40,50] (40,50] (40,50] (50,60] (50,60] (50,60] (50,60]
[55] (50,60] (50,60] (50,60] (50,60] (50,60] (50,60] (60,70] (60,70] (60,70]
[64] (60,70] (60,70] (60,70] (60,70] (60,70] (60,70] (60,70] (70,80] (70,80]
[73] (70,80] (70,80] (70,80] (70,80] (70,80] (70,80] (70,80] (70,80] (80,90]
[82] (80,90] (80,90] (80,90] (80,90] (80,90] (80,90] (80,90] (80,90] (80,90]
[91] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100] (90,100]
[100] (90,100]
10 Levels: (0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] (70,80] ... (90,100]
#state.division为因子类数据
> state.division
[1] East South Central Pacific Mountain West South Central
[5] Pacific Mountain New England South Atlantic
[9] South Atlantic South Atlantic Pacific Mountain
[13] East North Central East North Central West North Central West North Central
[17] East South Central West South Central New England South Atlantic
[21] New England East North Central West North Central East South Central
[25] West North Central Mountain West North Central Mountain
[29] New England Middle Atlantic Mountain Middle Atlantic
[33] South Atlantic West North Central East North Central West South Central
[37] Pacific Middle Atlantic New England South Atlantic
[41] West North Central East South Central West South Central Mountain
[45] New England South Atlantic Pacific South Atlantic
[49] East North Central Mountain
9 Levels: New England Middle Atlantic South Atlantic ... Pacific
#还有state.region
> state.region
[1] South West West South West West
[7] Northeast South South South West West
[13] North Central North Central North Central North Central South South
[19] Northeast South Northeast North Central North Central South
[25] North Central West North Central West Northeast Northeast
[31] West Northeast South North Central North Central South
[37] West Northeast Northeast South North Central South
[43] South West Northeast South West South
[49] North Central West
Levels: Northeast South North Central West