在Python中创建预测模型的简单指南，第一部分

2019-12-07 本文已影响0人国外课栈

本指南是由两部分组成的系列文章的第一部分，一个部分涉及数据的预处理和探索，另一个涉及实际的建模。此处使用的数据集来自....

在本文的整个过程中，不要过多地关注代码，而应大致了解“预处理”阶段会发生什么。

第1部分：预处理和探索：

在任何数据科学项目的开始阶段，预处理都是至关重要的部分（除非有人已经为您完成了这项工作）。它包括处理NULL值，检测异常值，通过分析删除不相关的列以及总体上清理数据。

让我们看看如何在python中做到这一点；

首先，让我们进行必要的导入。当需要时，我们将做更多一些。

“％matplotlib inline”是IPython中的魔术函数，它将使您的绘图输出出现并存储在笔记本中。

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

现在，让我们将数据作为pandas DataFrame加载到python中，并打印其信息以及几行内容，以了解数据。

df = pd.read_csv("Churn_Modelling.csv")

df.info()

df.head()

输出：

<class 'pandas.core.frame.DataFrame'>RangeIndex: 10000 entries, 0 to 9999Data columns (total 14 columns):

RowNumber 10000 non-null int64

CustomerId 10000 non-null int64

Surname 10000 non-null objectCreditScore 10000 non-null int64

Geography 10000 non-null objectGender 10000 non-null objectAge 10000 non-null int64

Tenure 10000 non-null int64

Balance 10000 non-null float64

NumOfProducts 10000 non-null int64

HasCrCard 10000 non-null int64

IsActiveMember 10000 non-null int64

EstimatedSalary 10000 non-null float64

Exited 10000 non-null int64

dtypes: float64(

2), int64(9), object(3)

memory usage:

1.1+ MB