R描述性统计分析与绘图

2018-12-08  本文已影响0人  蓝色滑行

setwd("D:/《用商业案例学R语言数据挖掘》教材代码及数据/data")
dat0 <- read.csv("accepts.csv",header = T)
View(dat0)

fs = dat0$fico_score
mean(fs,na.rm = T) #求变量fico_score的均值,忽略缺省值
[1] 693.5287
quantile(fs,probs = c(0.25,0.5,0.75),na.rm = T)
25% 50% 75%
653.0 693.0 735.5
hist(fs,nclass = 15)


直方图.png

str(dat0) #使用str查看数据中各个变量的类型

'data.frame': 5845 obs. of 24 variables:
application_id: int 2314049 63539 7328510 8725187 4275127 8712513 2063896 598458 1526052 8073975 ... account_number: int 11613 13449 14323 15359 15812 16979 17842 19715 23924 24866 ...
bad_ind : int 1 0 1 1 0 0 0 0 1 0 ... vehicle_year : int 1998 2000 1998 1997 2000 2000 2000 1994 1994 1999 ...
vehicle_make : Factor w/ 155 levels "","3HYUNDAI",..: 46 36 114 46 142 39 60 8 99 19 ... bankruptcy_ind: Factor w/ 3 levels "","N","Y": 2 2 2 2 2 3 2 2 2 3 ...
tot_derog : int 7 0 7 3 0 2 0 0 2 11 ... tot_tr : int 9 21 10 10 10 15 13 2 13 20 ...
age_oldest_tr : int 64 240 60 35 104 136 339 261 213 178 ... tot_open_tr : int 2 11 NA 5 2 4 4 NA 3 NA ...
tot_rev_tr : int 1 7 NA 4 0 3 3 1 2 3 ... tot_rev_debt : int 506 34605 NA 4019 0 3651 2094 146 2602 1815 ...
tot_rev_line : int 500 57241 NA 5946 1800 5747 20115 265 5529 2097 ... rev_util : int 101 60 0 68 0 64 10 55 47 87 ...
fico_score : int 650 649 613 603 764 680 794 722 664 646 ... purch_price : num 17200 19589 13595 12999 26328 ...
msrp : num 17350 19788 11450 12100 22024 ... down_pyt : num 0 684 0 3099 0 ...
loan_term : int 36 60 60 60 60 36 36 54 42 60 ... loan_amt : num 17200 19589 10500 10800 26328 ...
ltv : int 99 99 92 118 122 100 32 98 139 102 ... tot_income : num 6550 4667 2000 1500 4144 ...
veh_mileage : int 24000 22 19600 10000 14 1 500 77267 40000 6000 ... used_ind : int 1 0 1 1 0 0 0 1 1 1 ...

table(dat0$bad_ind) #查看违约与正常客户的频数

0 1
4648 1197

barplot(table(dat0$bad_ind)) # 使用barplot函数输出条形频数图

hist(fs,freq = TRUE,main = "fico_score",sub = "source:汽车贷款数据",xlab = "fico_score 打分",ylab = "频数",nclass = 20) #直方图的参数使用


Rplot01.png

箱型图的绘制

上一篇下一篇

猜你喜欢

热点阅读