实例数据操作02

2017-07-19  本文已影响0人  六六的建斌

今天看数据预处理,其实预处理和不处理,对结果的得分有很大的影响,最好是先比较两者的差异,再决定要不要用,预处理一般包括


scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)

三个步骤:1导入相关的预处理模块,并初始化,

2  匹配要处理的数据(一般都是因变量 测试的和训练的)

3  转换匹配处理后的结果

scaler = Min Max Scaler()

scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)

X_test_scaled = scaler.transform(X_test)

这个可以将两部合为一体:      X_scaled_d = scaler.fit_transform(X)

但卧槽



还有一种常见的:

##preprocessing using zero mean and unit variance scaling

from sklearn.preprocessing import StandardScaler



Principal Component Analysis (PCA)

Original shape: (569, 30)

Reduced shape: (569, 2)


擦,,看不懂打

from sklearn.cluster import KMeans


from sklearn.datasets import make_blobs

from sklearn.cluster import KMeans

# generate synthetic two-dimensional data

X, y = make_blobs(random_state=1)

# build the clustering model

kmeans = KMeans(n_clusters=3)

kmeans.fit(X)

data_dummies = pd.get_dummies(data)  生成哑变量

数字进行编码

demo_df = pd.Data Frame({'Integer Feature': [0, 1, 2, 1],

'Categorical Feature': ['socks', 'fox', 'socks', 'box']})




模型检测和提高

k-fold cross-validation, 最常用的交叉验证

最常用的函数是cross_val_score(), 第一个参数是选择的模型,第二个是因变量,第三个是输出值,默认是三重交叉验证,可以改变重数

A common way to summarize the cross-validation accuracy is to compute the mean:,最常用的是输出其均值

print("Average cross-validation score: {:.2f}".format(scores.mean()))

from sklearn.model_selection import Grid Search CV

from sklearn.svm import SVC

grid_search = Grid Search CV(SVC(), param_grid, cv=5)

X_train, X_test, y_train, y_test = train_test_split(

iris.data, iris.target, random_state=0)

grid_search.fit(X_train, y_train)

print("Test set score: {:.2f}".format(grid_search.score(X_test, y_test)))

Test set score: 0.97

print("Best parameters: {}".format(grid_search.best_params_))

print("Best cross-validation score: {:.2f}".format(grid_search.best_score_))


Precision-recall curves and ROC curves:

from sklearn.metrics import precision_recall_curve

precision, recall, thresholds = precision_recall_curve(

y_test, svc.decision_function(X_test))

Receiver operating characteristics (ROC) and AUC



上一篇 下一篇

猜你喜欢

热点阅读