R 统计学和算法科研信息学R

《数据挖掘与 R 语言》对数据中的缺失值的处理(填补)

2019-10-27  本文已影响0人  热衷组培的二货潜

此节内容均为书籍 《数据挖掘与 R 语言》第二章的 数据缺失部分

一、将缺失部分剔除

剔除含有缺失数据的记录很容易实现,尤其是当缺失值的比例在可用数据集中非常小的时候,此选择是比较合理的。

# install.packages("DMwR")
library(DMwR)
data(algae)

algae[!complete.cases(alage), ]

nrow(algae[!complete.cases(algae), ])
[1] 16

complete.cases() 函数产生一个布尔值向量,如果数据框的响应行中不含有 NA 值,即为一个完整的观测值,函数返回值就是 TRUE

algae <- na.omit(algae)
apply(algae, 1, function(x) sum(is.na(x)))

在使用 apply() 函数时候,它把一个函数应用到数据框的每一行( 1 表示第一个参数中的对象的第一个维度,即数据框的行数据)。

data(algae)
manyNAs(algae, 0.2) # 表示找出缺失值个数大于列数 20% 的行
[1]  62 199

二、用最高频率等特征值来填补缺失值

data(algae)
algae <- algae[-manyNAs(algae), ]
algae <- centralImputation(algae)

三、通过变量的相关关系来填补缺失值

cor(algae[, 4:18], use = "complete.obs")
            mxPH        mnO2          Cl         NO3         NH4         oPO4         PO4        Chla
mxPH  1.00000000 -0.10269374  0.14709539 -0.17213024 -0.15429757  0.090229085  0.10132957  0.43182377
mnO2 -0.10269374  1.00000000 -0.26324536  0.11790769 -0.07826816 -0.393752688 -0.46396073 -0.13121671
Cl    0.14709539 -0.26324536  1.00000000  0.21095831  0.06598336  0.379255958  0.44519118  0.14295776
NO3  -0.17213024  0.11790769  0.21095831  1.00000000  0.72467766  0.133014517  0.15702971  0.14549290
NH4  -0.15429757 -0.07826816  0.06598336  0.72467766  1.00000000  0.219311206  0.19939575  0.09120406
oPO4  0.09022909 -0.39375269  0.37925596  0.13301452  0.21931121  1.000000000  0.91196460  0.10691478
PO4   0.10132957 -0.46396073  0.44519118  0.15702971  0.19939575  0.911964602  1.00000000  0.24849223
Chla  0.43182377 -0.13121671  0.14295776  0.14549290  0.09120406  0.106914784  0.24849223  1.00000000
a1   -0.16262986  0.24998372 -0.35923946 -0.24723921 -0.12360578 -0.394574479 -0.45816781 -0.26601088
a2    0.33501740 -0.06848199  0.07845402  0.01997079 -0.03790296  0.123811068  0.13266789  0.36672465
a3   -0.02716034 -0.23522831  0.07653027 -0.09182236 -0.11290467  0.005704557  0.03219398 -0.06330113
a4   -0.18435348 -0.37982999  0.14147281 -0.01448875  0.27452000  0.382481433  0.40883951 -0.08600540
a5   -0.10731230  0.21001174  0.14534877  0.21213579  0.01544458  0.122027482  0.15548900 -0.07342837
a6   -0.17273795  0.18862656  0.16904394  0.54404455  0.40119275  0.003340366  0.05320294  0.01032550
a7   -0.17027088 -0.10455106 -0.04494524  0.07505030 -0.02539279  0.026150420  0.07978353  0.01760782
              a1           a2           a3          a4          a5           a6          a7
mxPH -0.16262986  0.335017401 -0.027160336 -0.18435348 -0.10731230 -0.172737947 -0.17027088
mnO2  0.24998372 -0.068481989 -0.235228307 -0.37982999  0.21001174  0.188626555 -0.10455106
Cl   -0.35923946  0.078454019  0.076530269  0.14147281  0.14534877  0.169043945 -0.04494524
NO3  -0.24723921  0.019970786 -0.091822358 -0.01448875  0.21213579  0.544044553  0.07505030
NH4  -0.12360578 -0.037902958 -0.112904666  0.27452000  0.01544458  0.401192749 -0.02539279
oPO4 -0.39457448  0.123811068  0.005704557  0.38248143  0.12202748  0.003340366  0.02615042
PO4  -0.45816781  0.132667891  0.032193981  0.40883951  0.15548900  0.053202942  0.07978353
Chla -0.26601088  0.366724647 -0.063301128 -0.08600540 -0.07342837  0.010325497  0.01760782
a1    1.00000000 -0.262665485 -0.108177581 -0.09338072 -0.26972709 -0.261564023 -0.19306384
a2   -0.26266549  1.000000000  0.009759915 -0.17628704 -0.18675894 -0.133518480  0.03620621
a3   -0.10817758  0.009759915  1.000000000  0.03336910 -0.14161095 -0.196900051  0.03906025
a4   -0.09338072 -0.176287038  0.033369102  1.00000000 -0.10131827 -0.084884259  0.07114638
a5   -0.26972709 -0.186758940 -0.141610948 -0.10131827  1.00000000  0.388608955 -0.05149346
a6   -0.26156402 -0.133518480 -0.196900051 -0.08488426  0.38860896  1.000000000 -0.03033428
a7   -0.19306384  0.036206205  0.039060248  0.07114638 -0.05149346 -0.030334277  1.00000000
symnum(cor(algae[, 4:18], use = "complete.obs"))
     mP mO Cl NO NH o P Ch a1 a2 a3 a4 a5 a6 a7
mxPH 1                                         
mnO2    1                                      
Cl         1                                   
NO3           1                                
NH4           ,  1                             
oPO4    .  .        1                          
PO4     .  .        * 1                        
Chla .                  1                      
a1         .        . .    1                   
a2   .                  .     1                
a3                               1             
a4      .           . .             1          
a5                                     1       
a6            .  .                     .  1    
a7                                           1 
attr(,"legend")
[1] 0 ‘ ’ 0.3 ‘.’ 0.6 ‘,’ 0.8 ‘+’ 0.9 ‘*’ 0.95 ‘B’ 1

这种用符号表示的相关值的方法更为清晰,特别是对于大的相关矩阵。
可以看到大多数变量之间是不相关的。然而有两个例外:变量 NH4 和 NO3之间,变量 PO4 和 oPO4 之间。故我们可以鉴于变量 PO4 和 oPO4 之间的相关性,可以用来填补这两个变量的缺失值。

data(algae)
algae <- algae[-manyNAs(algae), ]
lm(PO4 ~ oPO4, data = algae)

Call:
lm(formula = PO4 ~ oPO4, data = algae)

Coefficients:
(Intercept)         oPO4  
     42.897        1.293  

函数 lm() 可以用来获取形如果 Y = β0 + β1x1 + ... + βnxn 的线性模型。从上结果可得到的线性模型为:PO4 = 42.897 + 1.293 * oPO4。如果这两个变量不是同时缺失,那么可以通过此公式计算这些变量的缺失值。

algae[28, "PO4"] <- 42.897 + 1.293 * algae[28, "oPO4"]
fillPO4 <- function(oP){
        if (is.na(oP)){ 
                return(NA)
        }
        else
                return(42.897 + 1.293 * oP)
}

algae[is.na(algae$PO4), "PO4"] <- sapply(algae[is.na(algae$PO4), "oPO4"], fillPO4)

sapply() 函数第一个参数是一个向量,第二参数为一个函数。结果是另一个向量,该向量和第一个参数有相同的长度。这里 sapply() 函数的结果是填补变量 PO4 缺失值的向量。

histogram(~mxPH | season, data = algae)

上面绘制了不同季节变量 mxPH 的直方图。每个直方图对应于某个季节的观测值数据。但是可以看到上面季节顺序不是按照自然的时间顺序,可以通过转换数据框中因子季节标签的顺序,这样就可以使图形中的季节按照自然时间顺序排列。

algae$season <- factor(algae$season, levels = c("spring", "summer", "autumn", "winter"))
histogram(~mxPH | season, data = algae)

默认情况下,吧名义变量的值变为因子时,参数 levels 假定因子 水平值按照字母顺序排列, 不知道这里为什么没生效。。

四、通过探索案例之间的相似性来填补缺失值

data(algae)
algae <- algae[-manyNAs(algae), ]
  • 第一种:简单的计算这 10 个最相近的案例的中位数并用这个中位数来填补缺失值。如果缺失值是名义变量,我们将采用这 10 个最相似数据的中出现次数最多的值(即众数)。
  • 第二种:采用这些最相似数据的加权均值。权重的大小随着距待填补缺失值的个案的距离增大而减小。这里用高斯核函数(水平太低,不懂。)从距离获得权重。如果相邻个案距待填补缺失值的个案的距离为 d , 则它的值在加权平均中的权重为:

    上面的方法可以通过本书中的函数 knnImputation() 函数来实现。这个函数用一个欧式距离的变种来找到距任何个案最近的 k 个邻居。这个变种的欧氏距离可以应用于同时含有名义变量和数值变量的数据集中。计算公式如下:

    其中 δi() 是变量 i 的两个值之间的距离,即:
algae <- knnImputation(algae, k = 10)
season  size  speed     mxPH     mnO2     Cl    NO3      NH4    oPO4       PO4   Chla   a1   a2   a3   a4
1  winter small medium 8.000000  9.80000 60.800  6.238  578.000 105.000 170.00000 50.000  0.0  0.0  0.0  0.0
2  spring small medium 8.350000  8.00000 57.750  1.288  370.000 428.750 558.75000  1.300  1.4  7.6  4.8  1.9
3  autumn small medium 8.100000 11.40000 40.020  5.330  346.667 125.667 187.05701 15.600  3.3 53.6  1.9  0.0
4  spring small medium 8.070000  4.80000 77.364  2.302   98.182  61.182 138.70000  1.400  3.1 41.0 18.9  0.0
5  autumn small medium 8.060000  9.00000 55.350 10.416  233.700  58.222  97.58000 10.500  9.2  2.9  7.5  0.0
6  winter small   high 8.250000 13.10000 65.750  9.248  430.000  18.250  56.66700 28.400 15.1 14.6  1.4  0.0
7  summer small   high 8.150000 10.30000 73.250  1.535  110.000  61.250 111.75000  3.200  2.4  1.2  3.2  3.9
8  autumn small   high 8.050000 10.60000 59.067  4.990  205.667  44.667  77.43400  6.900 18.2  1.6  0.0  0.0
9  winter small medium 8.700000  3.40000 21.950  0.886  102.750  36.300  71.00000  5.544 25.4  5.4  2.5  0.0
10 winter small   high 7.930000  9.90000  8.000  1.390    5.800  27.250  46.60000  0.800 17.0  0.0  0.0  2.9
11 spring small   high 7.700000 10.20000  8.000  1.527   21.571  12.750  20.75000  0.800 16.6  0.0  0.0  0.0
12 summer small   high 7.450000 11.70000  8.690  1.588   18.429  10.667  19.00000  0.600 32.1  0.0  0.0  0.0
13 winter small   high 7.740000  9.60000  5.000  1.223   27.286  12.000  17.00000 41.000 43.5  0.0  2.1  0.0
14 summer small   high 7.720000 11.80000  6.300  1.470    8.000  16.000  15.00000  0.500 31.1  1.0  3.4  0.0
15 winter small   high 7.900000  9.60000  3.000  1.448   46.200  13.000  61.60000  0.300 52.2  5.0  7.8  0.0
16 autumn small   high 7.550000 11.50000  4.700  1.320   14.750   4.250  98.25000  1.100 69.9  0.0  1.7  0.0
17 winter small   high 7.780000 12.00000  7.000  1.420   34.333  18.667  50.00000  1.100 46.2  0.0  0.0  1.2
18 spring small   high 7.610000  9.80000  7.000  1.443   31.333  20.000  57.83300  0.400 31.8  0.0  3.1  4.8
19 summer small   high 7.350000 10.40000  7.000  1.718   49.000  41.500  61.50000  0.800 50.6  0.0  9.9  4.3
20 spring small medium 7.790000  3.20000 64.000  2.822 8777.600 564.600 771.59998  4.500  0.0  0.0  0.0 44.6
21 winter small medium 7.830000 10.70000 88.000  4.825 1729.000 467.500 586.00000 16.000  0.0  0.0  0.0  6.8
22 spring small   high 7.200000  9.20000  0.800  0.642   81.000  15.600  18.00000  0.500 15.5  0.0  0.0  2.3
23 autumn small   high 7.750000 10.30000 32.920  2.942   42.000  16.000  40.00000  7.600 23.2  0.0  0.0  0.0
24 winter small   high 7.620000  8.50000 11.867  1.715  208.333   3.000  27.50000  1.700 74.2  0.0  0.0  3.7
25 spring small   high 7.840000  9.40000 10.975  1.510   12.500   3.000  11.50000  1.500 13.0  8.6  1.2  3.5
26 summer small   high 7.770000 10.70000 12.536  3.976   58.500   9.000  44.13600  3.000  4.1  0.0  0.0  0.0
27 winter small   high 7.090000  8.40000 10.500  1.572   28.000   4.000  13.60000  0.500 29.7  0.0  0.0  4.9
28 autumn small   high 6.800000 11.10000  9.000  0.630   20.000   4.000  24.32265  2.700 30.3  1.9  0.0  0.0
29 winter small   high 8.000000  9.80000 16.000  0.730   20.000  26.000  45.00000  0.800 17.1  0.0 19.6  0.0
30 spring small   high 7.200000 11.30000  9.000  0.230  120.000  12.000  19.00000  0.500 33.9  1.0 14.6  0.0
31 autumn small   high 7.400000 12.50000 13.000  3.330   60.000  72.000 142.00000  4.900  3.4 16.0  1.2  0.0
32 winter small   high 8.100000 10.30000 26.000  3.780   60.000 246.000 304.00000  2.800  6.9 17.1 20.2  0.0
33 summer small   high 7.800000 11.30000 20.083  3.020   49.500  53.000 130.75000  5.800  0.0  8.0  1.9  0.0
34 autumn small medium 8.400000  9.90000 34.500  2.818 3515.000  20.000  47.00000  2.300 13.6  9.1  0.0  0.0
35 winter small medium 8.270000  7.80000 29.200  0.050 6400.000   7.400  23.00000  0.900  5.3 40.7  3.3  0.0
36 summer small medium 8.660000  8.40000 30.523  3.444 1911.000  58.875  84.46000  3.600 18.3 12.4  1.0  0.0
37 winter small   high 8.300000 10.90000  1.170  0.735   13.500   1.625   3.00000  0.200 66.0  0.0  0.0  0.0
38 spring small   high 8.000000 10.39811  1.450  0.810   10.000   2.500   3.00000  0.300 75.8  0.0  0.0  0.0
39 winter small medium 8.300000  8.90000 20.625  3.414  228.750 196.620 253.25000 12.320  2.0 38.5  4.1  2.2
40 spring small medium 8.100000 10.50000 22.286  4.071  178.570 182.420 255.28000  8.957  2.2  2.7  1.0  3.7
41 winter small medium 8.000000  5.50000 77.000  6.096  122.850 143.710 296.00000  3.700  0.0  5.9 10.6  1.7
42 summer small medium 8.150000  7.10000 54.190  3.829  647.570  59.429 175.04601 13.200  0.0  0.0  0.0  5.7
43 winter small   high 8.300000  7.70000 50.000  8.543   76.000 264.900 344.60001 22.500  0.0 40.9  7.5  0.0
44 spring small   high 8.300000  8.80000 54.143  7.830   51.429 276.850 326.85699 11.840  4.1  3.1  0.0  0.0
45 winter small   high 8.400000 13.40000 69.750  4.555   37.500  10.000  40.66700  3.900 51.8  4.1  0.0  0.0
46 spring small   high 8.300000 12.50000 87.000  4.870   22.500  27.000  43.50000  3.300 29.5  1.0  2.7  3.2
47 autumn small   high 8.000000 12.10000 66.300  4.535   39.000  16.000  39.00000  0.800 54.4  3.4  1.2  0.0
48 winter small    low 7.853154 12.60000  9.000  0.230   10.000   5.000   6.00000  1.100 35.5  0.0  0.0  0.0
49 spring small medium 7.600000  9.60000 15.000  3.020   40.000  27.000 121.00000  2.800 89.8  0.0  0.0  0.0
50 autumn small medium 7.290000 11.21000 17.750  3.070   35.000  13.000  20.81200 12.100 24.8  7.4  0.0  2.5
51 winter small medium 7.600000 10.20000 32.300  4.508  192.500  12.750  49.33300  7.900  0.0  0.0  0.0  4.6
52 summer small medium 8.000000  7.90000 27.233  1.651   28.333   7.300  22.90000  4.500 39.1  0.0  1.2  2.2
     a5   a6   a7 season1
1  34.2  8.3  0.0  winter
2   6.7  0.0  2.1  spring
3   0.0  0.0  9.7  autumn
4   1.4  0.0  1.4  spring
5   7.5  4.1  1.0  autumn
6  22.5 12.6  2.9  winter
7   5.8  6.8  0.0  summer
8   5.5  8.7  0.0  autumn
9   0.0  0.0  0.0  winter
10  0.0  0.0  1.7  winter
11  1.2  0.0  6.0  spring
12  0.0  0.0  1.5  summer
13  1.2  0.0  2.1  winter
14  1.9  0.0  4.1  summer
15  4.0  0.0  0.0  winter
16  0.0  0.0  0.0  autumn
17  0.0  0.0  0.0  winter
18  7.7  1.4  7.2  spring
19  3.6  8.2  2.2  summer
20  0.0  0.0  1.4  spring
21  6.1  0.0  0.0  winter
22  0.0  0.0  0.0  spring
23 27.6 11.1  0.0  autumn
24  0.0  0.0  0.0  winter
25  1.2  1.6  1.9  spring
26  9.2 10.1  0.0  summer
27  0.0  0.0  0.0  winter
28  2.1  1.4  2.1  autumn
29  0.0  0.0  2.5  winter
30  0.0  0.0  0.0  spring
31 15.3 15.8  0.0  autumn
32  4.0  0.0  2.9  winter
33 11.2 42.7  1.2  summer
34  1.4  0.0  0.0  autumn
35  0.0  0.0  1.9  winter
36  0.0  0.0  1.0  summer
37  0.0  0.0  0.0  winter
38  0.0  0.0  0.0  spring
39  0.0  0.0 10.2  winter
40  2.7  0.0  0.0  spring
41  0.0  0.0  7.1  winter
42 11.3 17.0  1.6  summer
43  2.4  1.5  0.0  winter
44 19.7 17.0  0.0  spring
45  3.1  5.5  0.0  winter
46  2.9  9.6  0.0  spring
47 18.7  2.0  0.0  autumn
48  0.0  0.0  0.0  winter
49  0.0  0.0  0.0  spring
50 10.6 17.1  3.2  autumn
51  1.2  0.0  3.9  winter
52  5.4  1.5  3.2  summer
 [ reached 'max' / getOption("max.print") -- omitted 148 rows ]
algae <- knnImputation(algae, k = 10, meth = "median")
 season  size  speed mxPH  mnO2     Cl    NO3      NH4    oPO4     PO4   Chla   a1   a2   a3   a4   a5   a6
1  winter small medium 8.00  9.80 60.800  6.238  578.000 105.000 170.000 50.000  0.0  0.0  0.0  0.0 34.2  8.3
2  spring small medium 8.35  8.00 57.750  1.288  370.000 428.750 558.750  1.300  1.4  7.6  4.8  1.9  6.7  0.0
3  autumn small medium 8.10 11.40 40.020  5.330  346.667 125.667 187.057 15.600  3.3 53.6  1.9  0.0  0.0  0.0
4  spring small medium 8.07  4.80 77.364  2.302   98.182  61.182 138.700  1.400  3.1 41.0 18.9  0.0  1.4  0.0
5  autumn small medium 8.06  9.00 55.350 10.416  233.700  58.222  97.580 10.500  9.2  2.9  7.5  0.0  7.5  4.1
6  winter small   high 8.25 13.10 65.750  9.248  430.000  18.250  56.667 28.400 15.1 14.6  1.4  0.0 22.5 12.6
7  summer small   high 8.15 10.30 73.250  1.535  110.000  61.250 111.750  3.200  2.4  1.2  3.2  3.9  5.8  6.8
8  autumn small   high 8.05 10.60 59.067  4.990  205.667  44.667  77.434  6.900 18.2  1.6  0.0  0.0  5.5  8.7
9  winter small medium 8.70  3.40 21.950  0.886  102.750  36.300  71.000  5.544 25.4  5.4  2.5  0.0  0.0  0.0
10 winter small   high 7.93  9.90  8.000  1.390    5.800  27.250  46.600  0.800 17.0  0.0  0.0  2.9  0.0  0.0
11 spring small   high 7.70 10.20  8.000  1.527   21.571  12.750  20.750  0.800 16.6  0.0  0.0  0.0  1.2  0.0
12 summer small   high 7.45 11.70  8.690  1.588   18.429  10.667  19.000  0.600 32.1  0.0  0.0  0.0  0.0  0.0
13 winter small   high 7.74  9.60  5.000  1.223   27.286  12.000  17.000 41.000 43.5  0.0  2.1  0.0  1.2  0.0
14 summer small   high 7.72 11.80  6.300  1.470    8.000  16.000  15.000  0.500 31.1  1.0  3.4  0.0  1.9  0.0
15 winter small   high 7.90  9.60  3.000  1.448   46.200  13.000  61.600  0.300 52.2  5.0  7.8  0.0  4.0  0.0
16 autumn small   high 7.55 11.50  4.700  1.320   14.750   4.250  98.250  1.100 69.9  0.0  1.7  0.0  0.0  0.0
17 winter small   high 7.78 12.00  7.000  1.420   34.333  18.667  50.000  1.100 46.2  0.0  0.0  1.2  0.0  0.0
18 spring small   high 7.61  9.80  7.000  1.443   31.333  20.000  57.833  0.400 31.8  0.0  3.1  4.8  7.7  1.4
19 summer small   high 7.35 10.40  7.000  1.718   49.000  41.500  61.500  0.800 50.6  0.0  9.9  4.3  3.6  8.2
20 spring small medium 7.79  3.20 64.000  2.822 8777.600 564.600 771.600  4.500  0.0  0.0  0.0 44.6  0.0  0.0
21 winter small medium 7.83 10.70 88.000  4.825 1729.000 467.500 586.000 16.000  0.0  0.0  0.0  6.8  6.1  0.0
22 spring small   high 7.20  9.20  0.800  0.642   81.000  15.600  18.000  0.500 15.5  0.0  0.0  2.3  0.0  0.0
23 autumn small   high 7.75 10.30 32.920  2.942   42.000  16.000  40.000  7.600 23.2  0.0  0.0  0.0 27.6 11.1
24 winter small   high 7.62  8.50 11.867  1.715  208.333   3.000  27.500  1.700 74.2  0.0  0.0  3.7  0.0  0.0
25 spring small   high 7.84  9.40 10.975  1.510   12.500   3.000  11.500  1.500 13.0  8.6  1.2  3.5  1.2  1.6
26 summer small   high 7.77 10.70 12.536  3.976   58.500   9.000  44.136  3.000  4.1  0.0  0.0  0.0  9.2 10.1
27 winter small   high 7.09  8.40 10.500  1.572   28.000   4.000  13.600  0.500 29.7  0.0  0.0  4.9  0.0  0.0
28 autumn small   high 6.80 11.10  9.000  0.630   20.000   4.000  17.000  2.700 30.3  1.9  0.0  0.0  2.1  1.4
29 winter small   high 8.00  9.80 16.000  0.730   20.000  26.000  45.000  0.800 17.1  0.0 19.6  0.0  0.0  0.0
30 spring small   high 7.20 11.30  9.000  0.230  120.000  12.000  19.000  0.500 33.9  1.0 14.6  0.0  0.0  0.0
31 autumn small   high 7.40 12.50 13.000  3.330   60.000  72.000 142.000  4.900  3.4 16.0  1.2  0.0 15.3 15.8
32 winter small   high 8.10 10.30 26.000  3.780   60.000 246.000 304.000  2.800  6.9 17.1 20.2  0.0  4.0  0.0
33 summer small   high 7.80 11.30 20.083  3.020   49.500  53.000 130.750  5.800  0.0  8.0  1.9  0.0 11.2 42.7
34 autumn small medium 8.40  9.90 34.500  2.818 3515.000  20.000  47.000  2.300 13.6  9.1  0.0  0.0  1.4  0.0
35 winter small medium 8.27  7.80 29.200  0.050 6400.000   7.400  23.000  0.900  5.3 40.7  3.3  0.0  0.0  0.0
36 summer small medium 8.66  8.40 30.523  3.444 1911.000  58.875  84.460  3.600 18.3 12.4  1.0  0.0  0.0  0.0
37 winter small   high 8.30 10.90  1.170  0.735   13.500   1.625   3.000  0.200 66.0  0.0  0.0  0.0  0.0  0.0
38 spring small   high 8.00 10.95  1.450  0.810   10.000   2.500   3.000  0.300 75.8  0.0  0.0  0.0  0.0  0.0
39 winter small medium 8.30  8.90 20.625  3.414  228.750 196.620 253.250 12.320  2.0 38.5  4.1  2.2  0.0  0.0
40 spring small medium 8.10 10.50 22.286  4.071  178.570 182.420 255.280  8.957  2.2  2.7  1.0  3.7  2.7  0.0
41 winter small medium 8.00  5.50 77.000  6.096  122.850 143.710 296.000  3.700  0.0  5.9 10.6  1.7  0.0  0.0
42 summer small medium 8.15  7.10 54.190  3.829  647.570  59.429 175.046 13.200  0.0  0.0  0.0  5.7 11.3 17.0
43 winter small   high 8.30  7.70 50.000  8.543   76.000 264.900 344.600 22.500  0.0 40.9  7.5  0.0  2.4  1.5
44 spring small   high 8.30  8.80 54.143  7.830   51.429 276.850 326.857 11.840  4.1  3.1  0.0  0.0 19.7 17.0
45 winter small   high 8.40 13.40 69.750  4.555   37.500  10.000  40.667  3.900 51.8  4.1  0.0  0.0  3.1  5.5
46 spring small   high 8.30 12.50 87.000  4.870   22.500  27.000  43.500  3.300 29.5  1.0  2.7  3.2  2.9  9.6
47 autumn small   high 8.00 12.10 66.300  4.535   39.000  16.000  39.000  0.800 54.4  3.4  1.2  0.0 18.7  2.0
48 winter small    low 7.90 12.60  9.000  0.230   10.000   5.000   6.000  1.100 35.5  0.0  0.0  0.0  0.0  0.0
49 spring small medium 7.60  9.60 15.000  3.020   40.000  27.000 121.000  2.800 89.8  0.0  0.0  0.0  0.0  0.0
50 autumn small medium 7.29 11.21 17.750  3.070   35.000  13.000  20.812 12.100 24.8  7.4  0.0  2.5 10.6 17.1
51 winter small medium 7.60 10.20 32.300  4.508  192.500  12.750  49.333  7.900  0.0  0.0  0.0  4.6  1.2  0.0
52 summer small medium 8.00  7.90 27.233  1.651   28.333   7.300  22.900  4.500 39.1  0.0  1.2  2.2  5.4  1.5
     a7 season1
1   0.0  winter
2   2.1  spring
3   9.7  autumn
4   1.4  spring
5   1.0  autumn
6   2.9  winter
7   0.0  summer
8   0.0  autumn
9   0.0  winter
10  1.7  winter
11  6.0  spring
12  1.5  summer
13  2.1  winter
14  4.1  summer
15  0.0  winter
16  0.0  autumn
17  0.0  winter
18  7.2  spring
19  2.2  summer
20  1.4  spring
21  0.0  winter
22  0.0  spring
23  0.0  autumn
24  0.0  winter
25  1.9  spring
26  0.0  summer
27  0.0  winter
28  2.1  autumn
29  2.5  winter
30  0.0  spring
31  0.0  autumn
32  2.9  winter
33  1.2  summer
34  0.0  autumn
35  1.9  winter
36  1.0  summer
37  0.0  winter
38  0.0  spring
39 10.2  winter
40  0.0  spring
41  7.1  winter
42  1.6  summer
43  0.0  winter
44  0.0  spring
45  0.0  winter
46  0.0  spring
47  0.0  autumn
48  0.0  winter
49  0.0  spring
50  3.2  autumn
51  3.9  winter
52  3.2  summer
 [ reached 'max' / getOption("max.print") -- omitted 148 rows ]

总之,通过这些简单的操作,数据集中不再含有 NA 值(缺失值)。

深入阅读

上一篇 下一篇

猜你喜欢

热点阅读