data analysis

2018-02-09  本文已影响12人  光_武

missing values

  1. visualizing missing values.


  2. imputing data set selection.(the picture is very important.)


  3. just few missing values by particular model



  4. predicting values based on other variables


model testing

  1. OOB error

The black line shows the overall error rate which falls below 20%. The red and green lines show the error rate for ‘died’ and ‘survived’ respectively.


  1. RMSE



    LASSO MODEL
    THE MOST POPULAR ALGORITHM

    THEN WE COULD COMPARE THEM BY RMSE AS FOLLOWS


variable importance

variable selecting

  1. Boruta Feature Importance Analysis
  2. Plotting all data
  3. Explore the correlation
  4. Plot scatter plot for variables that have high correlation.(the same link as 3th)



  5. some useful plot


上一篇下一篇

猜你喜欢

热点阅读