Basic Data Processing with Panda

2017-08-13  本文已影响0人  豊小乂

The Series Data Structure

np.nan 类似于 None,但不相等,也不相等于自己,要用特定的函数才能检测出来:

>>> import numpy as np
>>> np.nan == None
False
>>> np.nan == np.nan
False
>>> np.isnan(np.nan)  # `df.isnull` in DataFrame
True

建立一个 series,可以手动输入 index,不然就是 1、2、3(直接导入 list),如果 index 在 dictionary 里没有的话,建立 series 以后它对应的值就是 NaN:

>>> s = pd.Series(['Tiger', 'Bear', 'Moose'], index=['India', 'America', 'Canada'])
>>> s
India      Tiger
America     Bear
Canada     Moose
dtype: object

>>> sports = {'Archery': 'Bhutan',
              'Golf': 'Scotland',
              'Sumo': 'Japan',
              'Taekwondo': 'South Korea'}
>>> s = pd.Series(sports, index=['Golf', 'Sumo', 'Hockey'])
>>> s
Golf      Scotland
Sumo         Japan
Hockey         NaN
dtype: object

Querying a Series

output:

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

The DataFrame Data Structure

import pandas as pd
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
df

output:

         Cost Item Purchased   Name
Store 1  22.5       Dog Food  Chris
Store 1   2.5   Kitty Litter  Kevyn
Store 2   5.0      Bird Seed  Vinod

搜索语句范例:

df.loc['Store 2']
df.loc['Store 1', 'Cost']
df.loc['Store 1']['Cost']    # 用上面的更高效
df.T    #转置
df.T.loc['Cost']    # 搜索行坐标需要加 .loc[]
df['Cost']    # 搜索列坐标不需要加 .loc[]
df.loc[:,['Name', 'Cost']]

DataFrame Indexing and Loading

Querying a DataFrame

Indexing DataFrames

Missing Values

df.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)

method : (’backfill', 'bfill', 'pad', 'ffill', None), default None
Method to use for filling holes in reindexed Series
pad / ffill: propagate last valid observation forward to next valid
backfill / bfill: use NEXT valid observation to fill gap
axis : (0, 1, 'index', 'columns’)

Other querying

上一篇下一篇

猜你喜欢

热点阅读