model selection

2017-06-23 本文已影响0人 awakeLives

Q: What are the model selection and data manipulation techniques you follow to solve a probelm?

a. Generally, i try almost everything for most problems

b. in priciple for:

i. time series, GARCH, ARCH, regression, RIMA models.

ii. Image classification, deep learning (convolutional nets)

iii. Sound: commonly nns

iv. High cardinality categorical (like text data), linear models, FTRL, Vowpal wabbit, LibFFM, libFM, SVD

v. For everything else, everything, especially Gradient boosting machines (like XGBoost and LightGBM) and deep learning (like keras, Lasagne, caffe, Cxxnet)

c. I decided what model to keep/drop in meta modelling with feature selection techniques, Furthermore the latter may be:

i. Forward (cv or not)

ii. Backward (cv or not)

iii. Mixed (or stepwise)

iv. Permutations

v. Using feature importance or similar

vi. Apply some stats logic

d. Data manipulation could be different for every problem:

i. time series: moving averages, derivative, outlier removal

ii. text: tfidf, countvectorizers, word2vec, svd (dimensionality reduction). Semming, spell checking, Sparse matrices. Likelihood encoding, one hot encoding (or dummies).

iii. image classification, scalling, resizing, removing noise (smoothing), annotating

iv. sounds: Furrier Transformations, MFCC (MeI frequency cepstral coefficients), Low pass filters

v. everything else:

notes: deep learning in python to deal with text probelms: Keras (support sparse data), Gensim (for word2vec)

model selection

猜你喜欢

热点阅读