Pandas

pandas使用方法及相关函数记录1

2019-01-23  本文已影响6人  筝韵徽
import pandas as pd
import numpy as np

pandas使用方法记录总结

基本操作记录

  1. 查看pandas版本
pd.__version__
'0.23.4'
  1. 查看pandas及相关库版本
pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Windows
OS-release: 2008ServerR2
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: 3.8.0
pip: 18.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
        'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
        'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
        'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
  1. 创建DataFrame及查看DataFrame数据及基本结构信息
df = pd.DataFrame(data,index=labels)
df
image.png
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
animal      10 non-null object
age         8 non-null float64
visits      10 non-null int64
priority    10 non-null object
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes
df.describe()
image.png
  1. 返回前三行数据
df.iloc[:3]
image.png
df.head(3)
image.png
  1. 选择包含animal和age列的所有数据
df.loc[:,['animal','age']]
image.png
df[['animal','age']]
image.png
  1. 选择3,4,6行及animal,age列
df.loc[df.index[[3,4,6]],['animal','age']]
image.png
  1. 找出visits > 3的数据
df.loc[df['visits']>=3]
image.png
  1. 找出为age== Nan的数据
df.loc[df['age'].isnull()]
image.png
  1. 找出animal为cat并且age《3
df.loc[(df['animal']=='cat')&(df['age']<3)]
image.png
  1. 找出age在2,4之间的数据
df.loc[df['age'].between(2,4)]
image.png
  1. 修改f行age的值为1.5
df.loc['f','age']=1.5
df
image.png
  1. 计算一下visits总量
df['visits'].sum()
19
  1. 计算一下animal平均年龄
df.groupby('animal').agg({'age':'mean'})
image.png
df.groupby('animal')['age'].mean()
animal
cat      2.333333
dog      5.000000
snake    2.500000
Name: age, dtype: float64
  1. 添加一行
df.loc['k']=[2.3,'dog','no',3]
df
image.png
  1. 删除一行
df=df.drop('k')
df
image.png
  1. 统计个animal的数量
df.animal.value_counts()
cat      4
dog      4
snake    2
Name: animal, dtype: int64
  1. age降序,visits升序排序
df.sort_values(by=['age','visits'],ascending=[False,True])
image.png
  1. priorty 列值改成True|False
df['priority']=df['priority'].map({'yes':True,'no':False})
df
image.png
  1. animal中将snake改成python
df1=df.copy()
df1['animal']=df1['animal'].replace('snake','python')
df1
image.png
df
image.png
df2=df.copy()
df2.loc[df2['animal']=='snake','animal']='python'
df2
image.png
df
image.png
  1. 数据类型转换,将age,visits转换成数值类型
df['age'] =pd.to_numeric(df['age'],errors='coerce')
df['visits']=pd.to_numeric(df['visits'],errors='coerce')
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
animal      10 non-null object
age         8 non-null float64
visits      10 non-null int64
priority    10 non-null bool
dtypes: bool(1), float64(1), int64(1), object(1)
memory usage: 650.0+ bytes
df2.info()
<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
animal      10 non-null object
age         8 non-null object
visits      10 non-null object
priority    10 non-null bool
dtypes: bool(1), object(3)
memory usage: 650.0+ bytes
df2['age']=df2[['age']].astype('float')
df2['visits']=df2['visits'].astype('int')
df2.info()
<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
animal      10 non-null object
age         8 non-null float64
visits      10 non-null int32
priority    10 non-null bool
dtypes: bool(1), float64(1), int32(1), object(1)
memory usage: 610.0+ bytes
  1. 选择所有数值列
df.select_dtypes('number')
image.png

22.数据透视 查询每种动物每个visits的平均年龄

df.pivot_table(index='animal',columns='visits',values='age',aggfunc='mean')
image.png

中级操作记录

df3=pd.DataFrame({'A': [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7],'B':[10, 20, 20, 30, 40, 50, 50, 50, 60, 70, 70]})
df3['A'].shift()
0     NaN
1     1.0
2     2.0
3     2.0
4     3.0
5     4.0
6     5.0
7     5.0
8     5.0
9     6.0
10    7.0
Name: A, dtype: float64
df3['A'].shift(1)
0     NaN
1     1.0
2     2.0
3     2.0
4     3.0
5     4.0
6     5.0
7     5.0
8     5.0
9     6.0
10    7.0
Name: A, dtype: float64
df3['A'].shift(2)
0     NaN
1     NaN
2     1.0
3     2.0
4     2.0
5     3.0
6     4.0
7     5.0
8     5.0
9     5.0
10    6.0
Name: A, dtype: float64
df3['A'].shift(-1)
0     2.0
1     2.0
2     3.0
3     4.0
4     5.0
5     5.0
6     5.0
7     6.0
8     7.0
9     7.0
10    NaN
Name: A, dtype: float64
df3.shift(-1,axis=1)
image.png
df3.shift(1,axis=1)
image.png

DataFrame.shift(periods=1, freq=None, axis=0)
参数

df4=pd.DataFrame({'A': [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7],'B':[10, 20, 20, 30, 40, 50, 50, 50, 60, 70, 70]},index=pd.date_range('20190101','20190111'))
df4
image.png
import datetime
df4.shift(periods=1,freq=datetime.timedelta(2))
image.png
df4.shift(periods=-1,freq=datetime.timedelta(2))
image.png
df3.loc[df3['A'].shift()!=df3['A']]
image.png
df3.drop_duplicates(subset='A')
image.png
df3
image.png
上一篇 下一篇

猜你喜欢

热点阅读