机器学习实战项目1--特征归一化和生成交叉特征

2019-01-22 本文已影响0人 strive鱼

近期关于周志华的书中机器学习的主要算法马上就要撸完了，但是大家都明白只懂理论不写代码，等于啥都不会，所以接下来我会进行两方面的加强

在机器学习--聚类这篇文章中提到的ML 100 天，我会在jupyternotebook 中坚持学习，最终形成一个100天的知识总结，最后一起全部上传简书，当然这是三个月以后的事情啦，哈哈，大家监督，一起学习。

其次，之前参加了一个训练营，作业质量非常高，之前没时间写，现在一一完成分享到简书，一起进步，大概是9个项目

今天是第一个项目

Q1 ：特征归一化的方法（线性归一化，零均值归一化）的操作方法和优缺点

Q2: 利用sklearn.preprocessing.PolynomialFeatures 生成交叉特征

一、 answer1

sklearn 对应的传送门：
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler（归一化）

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler（均值化）

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html (特征交叉化）

官方文档的介绍和demo

#线性归一化
from sklearn.preprocessing import MinMaxScaler#线性 归一化
"""
参数：parameters
1.feature_range 打算将数据归一化后落在的范围区间，默认为[0,1]
2.copy 


重要属性：
1. data_min_ 变化后为[0,0]的最小数组的原始数组
2.data_max_


重要方法
1.fit(X) 训练要标准化的样本集
2.transform(X) 根据上述训练后的模型去处理x 
3.fit_transform(X) 对样本集和先进行训练，然后再根据该训练模型对x样本集进行处理
看看下面的demo 就会明白


data=[[-1,2],[-0.5,6],[0,10],[1,18]]
scaler=MinMaxScaler()#变量定义
print (scaler.fit(data))   #MinMaxScaler(copy=True, feature_range=(0, 1)),第一个参数copy 为True,表明会替代原先的data,feature_rane是归一化后的范围
print (scaler.data_min_)#返回[-1,2]
print (scaler.transform(data))
print (scaler.fit_transform([2,2]))#返回的是[1.5,0],若使用fit_transform 返回的是[0,0]


#零均值归一化
零均值归一化的目的是为了让数据更加的符合高斯分布，防止数据偏移太严重
from sklearn.preprocessing import StandardScaler #线性 归一化
1.重点参数
copy  If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace;
if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.

with_mean:boolean 如果为真，会在缩放(scaling)之前进行相关的均值计算
with_std: boolean  如果为真，缩放数据为单位标准差等同  单位方差


2.属性
mean_ 输出均值
var_输出方差



3. 方法
关键的方法仍然是fit,fit_transform,transform,get_params
data=[[0,0],[0,0],[1,1],[1,1]]

scaler=StandardScaler()
print (scaler.fit(data))

print (scaler.transform(data))

print (scaler.transform([[2,2]]))

二、answer2

"""
生成交叉特征
sklearn.preprocessing.PolynomialFeatures

目标：Generate polynomial(多项式) and interaction features,当特征很少的时候，可以利用该
方法迅速的扩充特征向量

1.重点参数
degree:integer 
the degree of the polynomial features  Default=2

interaction_only  默认让为false 
如果设定为True ,那么就只输出交互的特征,以degree=2为例子,指数为2的项不输出

include_bias Boolean
If True (default), then include a bias column, 
the feature in which all polynomial powers are zero 
即会产生一列特征全为1，这在线性回归中问题上显得很重要

2. 相关属性
1.powers_ 返回的是指数幂(a0 b0,a1 b0,a0 b1,a2 b0,a1 b1,a0 b2)
2.n_input_features_ init 返回的是维度数，也就是degree的值
3.n_output_features 返回的是多项式的项数（比如(a,b) degree=2,返回值为6，跟上述powers_对应）


3. 相关方法
仍然是fit,fit_transform,transform,get_params
"""

import numpy as np
x=np.arange(6).reshape(3,2)
#print (x)
poly1=PolynomialFeatures(2)
cross1=poly1.fit_transform(x)
#print (cross1)
#print (poly1.n_output_features_)
poly2=PolynomialFeatures(interaction_only=True)
cross2=poly2.fit_transform(x)
#print (cross2)

poly3=PolynomialFeatures(interaction_only=True, include_bias=False)
cross3=poly3.fit_transform(x)
print (cross3)

以上就是第一个项目--特征预处理的相关知识

机器学习实战项目1--特征归一化和生成交叉特征

一、 answer1

二、answer2

猜你喜欢

热点阅读