我爱编程

2018-06-15 开胃学习Data系列 - Pandas 基

2018-06-16  本文已影响0人  Kaiweio












The Series Data Structure


In [1]:
import pandas as pd
import numpy  as np
In [2]:
animals = ['Tiger', 'Bear', 'Moose','Bear']
pd.Series(animals)

Out[2]:      
0    Tiger
1     Bear
2    Moose
3     Bear
dtype: object
In [3]:
numbers = [1, 2, 3]
pd.Series(numbers)


Out[3]:
0    1
1    2
2    3
dtype: int64
In [4]:
animals = ['Tiger', 'Bear', None]
pd.Series(animals)


Out[4]:
0    Tiger
1     Bear
2     None
dtype: object
In [5]:
numbers = [1, 2, None]
pd.Series(numbers)


Out[5]:
0    1.0
1    2.0
2    NaN
dtype: float64


In [6]:
np.nan == None

Out[6]:
False
In [7]:
np.nan == np.nan

Out[7]:
False
In [8]:
np.isnan(np.nan)

Out[8]:
True
In [9]:
sports = {'Archery': 'Bhutan',
         'Golf': 'Scotland',
         'Sumo': 'Japan',
         'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s


Out[9]:
Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

In [10]:
s.index

Out[10]:
Index(['Archery', 'Golf', 'Sumo', 'Taekwondo'], dtype='object')

In [11]:
s = pd.Series(['Tiger', 'Bear', 'Moose'], index=['India', 'America', 'Canada'])
s


Out[11]:
India      Tiger
America     Bear
Canada     Moose
dtype: object
In [12]:
sports = {'Archery': 'Bhutan',
         'Golf': 'Scotland',
         'Sumo': 'Japan',
         'Taekwondo': 'South Korea'}
s = pd.Series(sports, index=['Golf', 'Sumo', 'Hockey'])
s


Out[12]:
Golf      Scotland
Sumo         Japan
Hockey         NaN
dtype: objec

那麽如果index中的值列表 与dictionary中用于创建该系列的keys不对齐












Querying a Series


sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Taekwondo': 'South Korea'}
s = pd.Series(sports)
s

>>>
Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object


s.iloc[3]
>>> 'South Korea'

s.loc['Golf']
>>> 'Scotland'

s[3]
>>> 'South Korea'

s['Golf']
>>> 'Scotland'

sports = {99: 'Bhutan',
          100: 'Scotland',
          101: 'Japan',
          102: 'South Korea'}
s = pd.Series(sports)


s[0] 
#This won't call s.iloc[0] as one might expect, it generates an error instead
s = pd.Series([100.00, 120.00, 101.00, 3.00])
s

>>>
0    100.0
1    120.0
2    101.0
3      3.0
dtype: float64

total = 0
for item in s:
    total+=item
print(total)
>>> 324.0

total = np.sum(s)
print(total)
>>> 324.0

现在这两种方法产生相同的值,但是哪一种是确实更快吗?
首先,设置一个大系列的随机(random)数字。

** 要使用的函数称为timeit。你可能已经从名称猜到了,此函数会运行我们的程式几次来确定,平均运行时间。


#this creates a big series of random numbers
s = pd.Series(np.random.randint(0,1000,10000))
s.head()


%%timeit -n 100
summary = 0
for item in s:
    summary+=item
>>> 100 loops, best of 3: 1.87 ms per loop

%%timeit -n 100
summary = np.sum(s)
>>>100 loops, best of 3: 107 µs per loop
%%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
for label, value in s.iteritems():
    s.loc[label]= value+2
>>>
10 loops, best of 3: 1.65 s per loop

%%timeit -n 10
s = pd.Series(np.random.randint(0,1000,10000))
s+=2
>>>​
10 loops, best of 3: 514 µs per loop
s = pd.Series([1, 2, 3])
s.loc['Animal'] = 'Bears'
s
>>>
0             1
1             2
2             3
Animal    Bears
dtype: object
original_sports = pd.Series({'Archery': 'Bhutan',
                             'Golf': 'Scotland',
                             'Sumo': 'Japan',
                             'Taekwondo': 'South Korea'})
cricket_loving_countries = pd.Series(['Australia',
                                      'Barbados',
                                      'Pakistan',
                                      'England'], 
                                   index=['Cricket',
                                          'Cricket',
                                          'Cricket',
                                          'Cricket'])
all_countries = original_sports.append(cricket_loving_countries)

original_sports
Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

cricket_loving_countries
Cricket    Australia
Cricket     Barbados
Cricket     Pakistan
Cricket      England
dtype: object

all_countries
Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
Cricket        Australia
Cricket         Barbados
Cricket         Pakistan
Cricket          England
dtype: object

all_countries.loc['Cricket']
Cricket    Australia
Cricket     Barbados
Cricket     Pakistan
Cricket      England
dtype: object












The DataFrame Data Structure


purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
df.head()

>>>
Cost        Item Purchased  Name
Store 1     22.5    Dog Food    Chris
Store 1    2.5  Kitty Litter    Kevyn
Store 2     5.0 Bird Seed   Vinod
df.loc['Store 2']
>>>
Cost                      5
Item Purchased    Bird Seed
Name                  Vinod
Name: Store 2, dtype: object
上一篇 下一篇

猜你喜欢

热点阅读