天天随手记,持续更新中(2018-05-02)

2018-05-02  本文已影响0人  叨逼叨小马甲
  1. 降维方法:
  1. 原始数据预处理,三步骤
  1. The process of machine learning机器学习步骤


    image.png
  2. Some classification algorithms

  1. 几种算法
    A. Regression

    • Ordinal Regression序数回归: data in rank ordered categories
    • Poisson Regression: predicts event counts
    • Fast forest quantile regression: predicts a distribution
    • Linear regression: fast training, linear model
    • Bayesian linear regression: linear model, small data sets
    • neural network regression: accurate, long training times
    • decision forest regression: accurate, fast training times
    • boosted decision tree regression: accurate, fast training times, large memory footprint
      B. Clustering
    • K-means: unsupervised learning
      C. Anomaly detection 异常检测
    • PCA-Based Anomaly detection: fast training times
    • Two-class classification: under 100 features, aggressive boundary
      D. Two-class classification
    • two-class SVM: under 100 features, linear model
    • two-class averaged perceptron: fast training, linear model
    • two-class bayes point machine: fast training, linear model
    • two-class decision forest
    • two-class regression
    • two-class boosted decision tree
    • two-class decision jungle
    • two-class locally deep SVM
    • two-class neural network
      E. Multiclass Classification
    • multiclass logistic regression
    • multiclass neural network
    • multiclass decision forest
    • multiclass decision jungle
    • one-v-all multiclass: depend on the two-class classifier
  2. Semi-supervised learning
    Between supervised learning and unsupervised learning; 少部分数据有label,大多数数据没有label; 有高准确率,且与supervised learning相比,它训练成本低很多。

  3. Reinforcement Learning增强学习
    从一系列动作中,学习到最大反馈方程,此处反馈方程可以是“bad actions”或“good action”; 增强学习常常用于自动驾驶中,即通过周遭环境的一系列反馈来做出决定。


    image.png
  4. 机器学习算法,分类图


    image.png
  5. 一个tip
    如果训练过程中,数据结果很好,但在评估阶段结果很差,那很有可能是overfitting了。

  6. 常用validation的三种方法

    • hold-out validation,预留校验数据;适用大数据样本

    • k-fold cross validation,将训练集分成k等份;适用小数据样本


      image.png
    • leave-one-out validation(LOOCV),特殊的k-fold交叉校验,重复直至每个观察样本都作为过了校验数据。

  7. 评估模型的几种方法


    image.png
image.png
  1. 一个tip
    有时候一个准确率很高的模型并不能说它是有用的,比如,一个模型说99%无癌症,1%有癌症,这是一个样本分布不均匀的案例, 此时需要建立两个模型,模型A用来判定有癌症,模型B用来判定无癌症

  2. Bias和Variance问题
    underfit属于high bias
    overfit属于high variant
    判断模型的好坏的过程中,如果训练集效果很好,但是校验集不好,那么是high variance问题(即overfit);如果训练集和校验集效果都不好,那么是high bias问题(即underfit)。
    解决方法:


    image.png
上一篇 下一篇

猜你喜欢

热点阅读