58.关于学习因子的一个有用的数据集gss_cat

2021-09-13  本文已影响0人  心惊梦醒

【上一篇:57.关于因子的四要素之创建因子】
【下一篇:59.关于调整因子的属性levels的order(一)】

    forcats::gss_cat数据集是General Social Survey中的一个样本,General Social Survel是由芝加哥大学的一个独立调查机构NORC进行的一个长期的美国调查,调查中有成千上万的问题,gss_cat数据集中包含的分类变量很适合用来演示处理因子时会遇到的常见挑战。

> gss_cat
# A tibble: 21,483 x 9
    year marital     age race  rincome   partyid     relig    denom    tvhours
   <int> <fct>     <int> <fct> <fct>     <fct>       <fct>    <fct>      <int>
 1  2000 Never ma~    26 White $8000 to~ Ind,near r~ Protest~ Souther~      12
 2  2000 Divorced     48 White $8000 to~ Not str re~ Protest~ Baptist~      NA
 3  2000 Widowed      67 White Not appl~ Independent Protest~ No deno~       2
 4  2000 Never ma~    39 White Not appl~ Ind,near r~ Orthodo~ Not app~       4
 5  2000 Divorced     25 White Not appl~ Not str de~ None     Not app~       1
 6  2000 Married      25 White $20000 -~ Strong dem~ Protest~ Souther~      NA
 7  2000 Never ma~    36 White $25000 o~ Not str re~ Christi~ Not app~       3
 8  2000 Divorced     44 White $7000 to~ Ind,near d~ Protest~ Luthera~      NA
 9  2000 Married      44 White $25000 o~ Not str de~ Protest~ Other          0
10  2000 Married      47 White $25000 o~ Strong rep~ Protest~ Souther~       3
# ... with 21,473 more rows

year:调查年份,2000-2014
age:年龄,最大年龄截断到89岁
marital:婚姻状态
race:种族
rincome:reported income
partyid:党派
relig:宗教信仰,例如道教、基督教、佛教、伊斯兰教
denom:教派、派别,例如佛教分为汉传佛教、藏传佛教,道教分茅山派、天师道、全真道等
tvhours:每天看电视的时间

    gss_cat数据集中有六列是因子,levels()函数可以查看因子所有的levels,count()函数可以看当前数据集中包含的具体值。

> gss_cat %>% count(race)
# A tibble: 3 x 2
  race      n
  <fct> <int>
1 Other  1959
2 Black  3129
3 White 16395

> levels(gss_cat$race)
[1] "Other"          "Black"          "White"          "Not applicable"

    用geom_bar()绘制每个种族的数量:

> library(ggpubr)
# ggplot2默认扔掉没有任何值的level
> p1<-ggplot(gss_cat, aes(race)) + geom_bar()
# scale_x_discrete(drop=FALSE)关闭ggplot2的次默认行为
> p2<-ggplot(gss_cat, aes(race)) + geom_bar() + scale_x_discrete(drop = FALSE)
> ggarrange(p1,p2)
ggplot2默认扔掉没有任何值的level

    gss_cat数据集中relig(宗教信仰)和denom(教派)的关系有哪些(如下)?可以发现,新教中有更多的派别(个人理解)。

> gss_cat %>% count(relig,denom) %>% print(n=Inf)
# A tibble: 47 x 3
   relig                   denom                    n
   <fct>                   <fct>                <int>
 1 No answer               No answer               93
 2 Don't know              Not applicable          15
 3 Inter-nondenominational Not applicable         109
 4 Native american         Not applicable          23
 5 Christian               No answer                2
 6 Christian               Don't know              11
 7 Christian               No denomination        452
 8 Christian               Not applicable         224
 9 Orthodox-christian      Not applicable          95
10 Moslem/islam            Not applicable         104
11 Other eastern           Not applicable          32
12 Hinduism                Not applicable          71
13 Buddhism                Not applicable         147
14 Other                   No denomination          7
15 Other                   Not applicable         217
16 None                    Not applicable        3523
17 Jewish                  Not applicable         388
18 Catholic                Not applicable        5124
19 Protestant              No answer               22
20 Protestant              Don't know              41
21 Protestant              No denomination       1224
22 Protestant              Other                 2534
23 Protestant              Episcopal              397
24 Protestant              Presbyterian-dk wh     244
25 Protestant              Presbyterian, merged    67
26 Protestant              Other presbyterian      47
27 Protestant              United pres ch in us   110
28 Protestant              Presbyterian c in us   104
29 Protestant              Lutheran-dk which      267
30 Protestant              Evangelical luth       122
31 Protestant              Other lutheran          30
32 Protestant              Wi evan luth synod      71
33 Protestant              Lutheran-mo synod      212
34 Protestant              Luth ch in america      71
35 Protestant              Am lutheran            146
36 Protestant              Methodist-dk which     239
37 Protestant              Other methodist         33
38 Protestant              United methodist      1067
39 Protestant              Afr meth ep zion        32
40 Protestant              Afr meth episcopal      77
41 Protestant              Baptist-dk which      1457
42 Protestant              Other baptists         213
43 Protestant              Southern baptist      1536
44 Protestant              Nat bapt conv usa       40
45 Protestant              Nat bapt conv of am     76
46 Protestant              Am bapt ch in usa      130
47 Protestant              Am baptist asso        237

> gss_cat %>% count(relig,denom) %>% ggplot(aes(x=relig,y=n,fill=denom))+geom_bar(stat = "identity",position = "dodge")
宗教与派别的可视化

【上一篇:57.关于因子的四要素之创建因子】
【下一篇:59.关于调整因子的属性levels的order(一)】

上一篇 下一篇

猜你喜欢

热点阅读