大数据玩转大数据

model selection

2017-06-23  本文已影响0人  awakeLives

Q: What are the model selection and data manipulation techniques you follow to solve a probelm?

a. Generally, i try almost everything for most problems

b. in priciple for:

    i. time series, GARCH, ARCH, regression, RIMA models.

    ii. Image classification, deep learning (convolutional nets)

    iii. Sound: commonly nns

    iv. High cardinality categorical (like text data), linear models, FTRL, Vowpal    wabbit,  LibFFM, libFM, SVD

     v. For everything else, everything, especially Gradient boosting machines (like XGBoost  and LightGBM) and deep learning (like keras, Lasagne, caffe, Cxxnet)

c. I decided what model to keep/drop in meta modelling with feature selection techniques, Furthermore the latter may be:

    i. Forward (cv or not)

    ii. Backward (cv or not)

    iii. Mixed (or stepwise)

    iv. Permutations

    v. Using feature importance or similar

    vi. Apply some stats logic

d. Data manipulation could be different for every problem:

    i. time series: moving averages, derivative, outlier removal

    ii. text: tfidf, countvectorizers, word2vec, svd (dimensionality reduction). Semming, spell checking, Sparse matrices. Likelihood encoding, one hot encoding (or dummies).

   iii. image classification, scalling, resizing, removing noise (smoothing), annotating

    iv. sounds: Furrier Transformations, MFCC (MeI frequency cepstral coefficients), Low pass filters

    v. everything else:

notes: deep learning in python to deal with text probelms: Keras (support sparse data), Gensim (for word2vec)

上一篇 下一篇

猜你喜欢

热点阅读