data analysis
2018-02-09 本文已影响12人
光_武
missing values
-
visualizing missing values.
-
imputing data set selection.(the picture is very important.)
-
just few missing values by particular model
-
predicting values based on other variables
model testing
- OOB error
The black line shows the overall error rate which falls below 20%. The red and green lines show the error rate for ‘died’ and ‘survived’ respectively.
-
RMSE
LASSO MODEL
THE MOST POPULAR ALGORITHM
THEN WE COULD COMPARE THEM BY RMSE AS FOLLOWS
variable importance
variable selecting
- Boruta Feature Importance Analysis
- Plotting all data
-
Explore the correlation
-
Plot scatter plot for variables that have high correlation.(the same link as 3th)
-
some useful plot