RR语言学习

R语言学习笔记(6)-数据框

2021-01-19  本文已影响0人  Akuooo

一、数据框

数据框是一种表格式的数据结构,旨在模拟数据集,与其他统计软件如SAS或SPSS中的数据集概念一致。

通常是由数据构成的一个矩形数组,行表示观测值,列表示变量。

  1. 特点:实际上是一个列表。
    列表中的元素是向量,这些向量构成数据框的列,每一列必须具有相同的长度,所以数据框是矩形结构,而且数据框的列必须命名

与矩阵的比较:
①形状相似
②数据框是比较规则的列表
③矩阵必须为同一数据烈性;
数据框每一列必须为同一类型,每一行可以不同。

  1. R内置数据框结构的数据集
    iris鸢尾花
    mtcars 32辆汽车数据
    rock 48块石头形状的数据

  2. 数据库创建

> ?data.frame
> state <- data.frame(state.name,state.abb,state.region,state.x77)
> state
state.png

如果想将数据存储在R中进行分析,需要每个内容单独存储为一个向量,然后用data.frame合并即可。

二、数据框的访问

数据框包含向量、矩阵、列表

  1. 通过索引来访问数据
> state[1]
                   state.name
Alabama               Alabama
Alaska                 Alaska
Arizona               Arizona
Arkansas             Arkansas
California         California
Colorado             Colorado
Connecticut       Connecticut
Delaware             Delaware
Florida               Florida
Georgia               Georgia
Hawaii                 Hawaii
Idaho                   Idaho
Illinois             Illinois
Indiana               Indiana
Iowa                     Iowa
Kansas                 Kansas
Kentucky             Kentucky
Louisiana           Louisiana
Maine                   Maine
Maryland             Maryland
Massachusetts   Massachusetts
Michigan             Michigan
Minnesota           Minnesota
Mississippi       Mississippi
Missouri             Missouri
Montana               Montana
Nebraska             Nebraska
Nevada                 Nevada
New Hampshire   New Hampshire
New Jersey         New Jersey
New Mexico         New Mexico
New York             New York
North Carolina North Carolina
North Dakota     North Dakota
Ohio                     Ohio
Oklahoma             Oklahoma
Oregon                 Oregon
Pennsylvania     Pennsylvania
Rhode Island     Rhode Island
South Carolina South Carolina
South Dakota     South Dakota
Tennessee           Tennessee
Texas                   Texas
Utah                     Utah
Vermont               Vermont
Virginia             Virginia
Washington         Washington
West Virginia   West Virginia
Wisconsin           Wisconsin
Wyoming               Wyoming
> state[c(2,4)]//输出第二列第四列
>state[,"state.abb"]//利用数据列名,取出对应的列
 [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA" "KS" "KY" "LA"
[19] "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ" "NM" "NY" "NC" "ND" "OH" "OK"
[37] "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT" "VA" "WA" "WV" "WI" "WY"
> state["Alabama",]//取出对应的行
        state.name state.abb state.region Population Income Illiteracy Life.Exp Murder HS.Grad Frost  Area
Alabama    Alabama        AL        South       3615   3624        2.1    69.05   15.1    41.3    20 50708
  1. 访问(常用,可快速取出任意一列) ![state.png](https://img.haomeiwen.com/i19791022/cedcd81a5d019d30.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)

例如:women数据集,记录了女性身高体重
绘图时

> plot(women$height,women$weight)
women.png
> lm(weight ~height,data = women)

Call:
lm(formula = weight ~ height, data = women)

Coefficients:
(Intercept)       height  
     -87.52         3.45  
  1. attach()
    (加载数据框到R所在目录中)
> attach(mtcars)//加载后,不需要$即可访问
> mpg
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7
[18] 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
> hp
 [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52  65  97
[22] 150 150 245 175  66  91 113 264 175 335 109
>detach(mtcars)//detach()取消加载
  1. with()
    (也不需要$)
> with(mtcars,{hp})
 [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230  66  52  65  97
[22] 150 150 245 175  66  91 113 264 175 335 109
  1. 单双中括号


    双中括号.png
上一篇下一篇

猜你喜欢

热点阅读