R

kaggle案例重复:核电站在世界范围内的分布

2019-05-04  本文已影响13人  小明的数据分析笔记本

原文地址
https://www.kaggle.com/jonathanbouchet/nuclear-power-plant-geo-data
Nuclear Power Plant Locations data

新遇到的R包
>summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width          Species  
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   setosa    :50  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   versicolor:50  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300   virginica :50  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199                  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800                  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500                  

>fivenum(iris$Sepal.Length)
[1] 4.3 5.1 5.8 6.4 7.9

>skim(iris)
Skim summary statistics
 n obs: 150 
 n variables: 5 

-- Variable type:factor --------------------------------------------------------
 variable missing complete   n n_unique                       top_counts ordered
  Species       0      150 150        3 set: 50, ver: 50, vir: 50, NA: 0   FALSE

-- Variable type:numeric -------------------------------------------------------
     variable missing complete   n mean   sd  p0 p25  p50 p75 p100     hist
 Petal.Length       0      150 150 3.76 1.77 1   1.6 4.35 5.1  6.9 ▇▁▁▂▅▅▃▁
  Petal.Width       0      150 150 1.2  0.76 0.1 0.3 1.3  1.8  2.5 ▇▁▁▅▃▃▂▂
 Sepal.Length       0      150 150 5.84 0.83 4.3 5.1 5.8  6.4  7.9 ▂▇▅▇▆▅▂▂
  Sepal.Width       0      150 150 3.06 0.44 2   2.8 3    3.3  4.4 ▁▂▅▇▃▂▁▁
> 
> ymd("20110604")
[1] "2011-06-04"
> mdy("06-04-2011")
[1] "2011-06-04"
> dmy("04/06/2011")
[1] "2011-06-04"
> 
ggplot(mtcars,aes(wt,mpg))+
  geom_point(size=4,aes(colour=factor(cyl)))+
  scale_color_viridis_d()+theme_bw()
Rplot.png
> lmfit<-lm(mpg~wt,mtcars)
> lmfit

Call:
lm(formula = mpg ~ wt, data = mtcars)

Coefficients:
(Intercept)           wt  
     37.285       -5.344  

> summary(lmfit)

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5432 -2.3647 -0.1252  1.4096  6.8727 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528,    Adjusted R-squared:  0.7446 
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

> broom::tidy(lmfit)
# A tibble: 2 x 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    37.3      1.88      19.9  8.24e-19
2 wt             -5.34     0.559     -9.56 1.29e-10
> broom::glance(lmfit)
# A tibble: 1 x 11
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC
*     <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl>
1     0.753         0.745  3.05      91.4 1.29e-10     2  -80.0  166.
# ... with 3 more variables: BIC <dbl>, deviance <dbl>,
#   df.residual <int>
> broom::augment(lmfit)
# A tibble: 32 x 10
   .rownames   mpg    wt .fitted .se.fit .resid   .hat .sigma .cooksd
 * <chr>     <dbl> <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
 1 Mazda RX4  21    2.62    23.3   0.634 -2.28  0.0433   3.07 1.33e-2
 2 Mazda RX~  21    2.88    21.9   0.571 -0.920 0.0352   3.09 1.72e-3
 3 Datsun 7~  22.8  2.32    24.9   0.736 -2.09  0.0584   3.07 1.54e-2
 4 Hornet 4~  21.4  3.22    20.1   0.538  1.30  0.0313   3.09 3.02e-3
 5 Hornet S~  18.7  3.44    18.9   0.553 -0.200 0.0329   3.10 7.60e-5
 6 Valiant    18.1  3.46    18.8   0.555 -0.693 0.0332   3.10 9.21e-4
 7 Duster 3~  14.3  3.57    18.2   0.573 -3.91  0.0354   3.01 3.13e-2
 8 Merc 240D  24.4  3.19    20.2   0.539  4.16  0.0313   3.00 3.11e-2
 9 Merc 230   22.8  3.15    20.5   0.540  2.35  0.0314   3.07 9.96e-3
10 Merc 280   19.2  3.44    18.9   0.553  0.300 0.0329   3.10 1.71e-4
# ... with 22 more rows, and 1 more variable: .std.resid <dbl>

新遇到的函数

使用dplyr::rename函数的时候报错Error: `petal_length` = Petal.Length must be a symbol or a string, not a formula;搜索报错找到了一个解决办法https://stackoverflow.com/questions/47755534/dplyr-rename-error-new-name-old-name-must-be-a-symbol-or-a-string-not-fo
自己把R由R-3.4.2换成了R-3.5.1就不在有这个报错了

重复原文的两张地图
library(rworldmap)
library(ggplot2)
worldMap <- fortify(map_data("world"), region = "region")
ggplot() + 
  geom_map(data = worldMap, 
           map = worldMap,aes(x = long, y = lat,
                              map_id = region, 
                              group = group),
           fill = "white", color = "black", size = 0.1) + 
  theme_fivethirtyeight(10)
Rplot01.png
library(ggplot2)
library(rworldmap)
ggplot(res) + 
  geom_polygon(aes(x=long, y=lat,group=group,fill=totMWe),
               color='white', size=.1) + 
  theme_fivethirtyeight() + 
  theme(panel.grid.major = element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank()) + 
  scale_fill_gradientn(name="",
                       colors = rev(viridis::viridis(50))) + 
  guides(fill = guide_colorbar(barwidth = 20, barheight = .5)) + 
  labs(title="Nuclear power plant landscape in 2019", 
       subtitle='energy produced(MWe) by nuclear source from active powerplant')
Rplot02.png

根据上图可以得到的结论:
Top 3 producers: 美国;法国;中国
朝鲜:No production

欢迎大家关注我的公众号 小明的数据分析笔记本

公众号二维码.jpg
上一篇 下一篇

猜你喜欢

热点阅读