Python之1 pandas过滤特定的行和列

2020-05-27  本文已影响0人  夕颜00
1.导入pandas模块
>>> import pandas as pd

2.导入数据
>>> titanic = pd.read_csv(r'C:\Users\Administrator\Desktop\titanic.csv')

3.选择单列
>>> ages = titanic["Age"]
>>> ages.head()
0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64
>>> type(ages)
<class 'pandas.core.series.Series'>

head()方法,不指定行数,则默认显示5行,单列的类型是Series

4.选择多列
>>> age_sex = titanic[["Name", "Age", "Sex"]]
>>> age_sex.head()
                                                Name   Age     Sex
0                            Braund, Mr. Owen Harris  22.0    male
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0  female
2                             Heikkinen, Miss. Laina  26.0  female
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  35.0  female
4                           Allen, Mr. William Henry  35.0    male
>>> type(age_sex)
<class 'pandas.core.frame.DataFrame'>
>>> age_sex.shape
(891, 3)

双列或以上的类型依旧是DataFrame,shape方法使用在DataFrame上则返回(行,列),若使用在Series上则返回行数

5.过滤特定的单行,单条件
>>> age_35 = titanic[titanic["Age"] > 35]
>>> age_35.head()
    PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
1             2         1       1  ...  71.2833   C85         C
6             7         0       1  ...  51.8625   E46         S
11           12         1       1  ...  26.5500  C103         S
13           14         0       3  ...  31.2750   NaN         S
15           16         1       2  ...  16.0000   NaN         S
>>> age_35.shape      #过滤后的行数
(217, 12)
>>> titanic.shape     #过滤前的行数
(891, 12)

条件表达式还支持( =>,>,==, !=,<,<=)等等...

6.过滤特定单行,多条件
>>> class_23 = titanic[titanic["Pclass"].isin([2, 3])]
>>> class_23.head()
   PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0            1         0       3  ...   7.2500   NaN         S
2            3         1       3  ...   7.9250   NaN         S
4            5         0       3  ...   8.0500   NaN         S
5            6         0       3  ...   8.4583   NaN         Q
7            8         0       3  ...  21.0750   NaN         S

[5 rows x 12 columns]
>>> class_23.shape
(675, 12)
>>> class_23 = titanic[(titanic["Pclass"] == 2) | (titanic["Pclass"] == 3)]
>>> class_23.shape
(675, 12)

isin()函数等价于多条件判断的or或and,但你需要使用"|"和"&"替代or和and,且不同的条件需要单独用"()"

7.过滤非空值
>>> age_no_na = titanic[titanic["Age"].notna()]
>>> age_no_na.shape
(714, 12)
>>> titanic.shape
(891, 12)

8.过滤特定行和列
>>> adult_names = titanic.loc[titanic["Age"] > 35, ["Name", "Age"]]
>>> adult_names.head()
                                                 Name   Age
1   Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0
6                             McCarthy, Mr. Timothy J  54.0
11                           Bonnell, Miss. Elizabeth  58.0
13                        Andersson, Mr. Anders Johan  39.0
15                   Hewlett, Mrs. (Mary D Kingcome)   55.0
>>>
>>> titanic.iloc[9:25, 2:5]    #选取10至25行和第3至5列
    Pclass                                               Name     Sex
9        2                Nasser, Mrs. Nicholas (Adele Achem)  female
10       3                    Sandstrom, Miss. Marguerite Rut  female
11       1                           Bonnell, Miss. Elizabeth  female
12       3                     Saundercock, Mr. William Henry    male
13       3                        Andersson, Mr. Anders Johan    male
14       3               Vestrom, Miss. Hulda Amanda Adolfina  female
15       2                   Hewlett, Mrs. (Mary D Kingcome)   female
16       3                               Rice, Master. Eugene    male
17       2                       Williams, Mr. Charles Eugene    male
18       3  Vander Planke, Mrs. Julius (Emelia Maria Vande...  female
19       3                            Masselmani, Mrs. Fatima  female
20       2                               Fynney, Mr. Joseph J    male
21       2                              Beesley, Mr. Lawrence    male
22       3                        McGowan, Miss. Anna "Annie"  female
23       1                       Sloper, Mr. William Thompson    male
24       3                      Palsson, Miss. Torborg Danira  female
>>> titanic.iloc[0:3, 3] = "anonymous"  #将名称分配给anonymous第三列的前三个元素
>>> titanic.head()
   PassengerId  Survived  Pclass                                          Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked
0            1         0       3                                     anonymous    male  22.0      1      0         A/5 21171   7.2500   NaN        S
1            2         1       1                                     anonymous  female  38.0      1      0          PC 17599  71.2833   C85        C
2            3         1       3                                     anonymous  female  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S
3            4         1       1  Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0            113803  53.1000  C123        S
4            5         0       3                      Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        S

关于loc/iloc[行,列],其中loc:通过选取行(列)标签索引数据
iloc:通过选取行(列)位置编号索引数据从0开始计数。

上一篇下一篇

猜你喜欢

热点阅读