DS Interview Question--Missing V

2017-06-28  本文已影响0人  Vivian有好多美好的故事

Q: During analysis, how do you treat missing values?

A: 

First, we need to know the pattern of missing data:1. Missing completely at random (MCAR): there is no pattern in the missing data on any variables. (The most and the best situation); 2. Missing at random (pattern not affect primary dependent variables);3. Missing not at random (pattern affect primary dependent variables)

And then we can choose different methods to deal with missing values:

Deletion: If we have enough observations and the missing data is random, we can delete the observations with missing values and don't introduce bias.

Imputation: 1. Replace missing values with mean/ median/ mode or set default value; 2. Replace missing data by building models(eg. Regression/ KNN, etc.)

Others: Complex methods like Multiple Imputation (MI), Hot Deck, etc.

Ignorance: Some models, like random forest, can deal with missing values by itself.

Interview questions are from DataAppLab (Wechat: Datalaus)

Jun.27th, 2017  Seattle

上一篇下一篇

猜你喜欢

热点阅读