Python_数据分析_pandas

Pandas 库

2022-08-02  本文已影响0人  灵活胖子的进步之路

title: An R Markdown document converted from "04.ipynb"
output: html_document


一、Pandas 库

Pandas 是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。

但是与NumPy不同,Pandas更适合处理表格型或异质性数据(NumPy更适合处理同质型的

数值类数组数据),并提供了大量数学函数及计算方法。

import numpy as np
import pandas as pd

二、Pandas 库数据结构——Series, DataFrame

1. Series——索引index,值values

a = pd.Series([1, 2, 3, 4, 5])
a
a.index
a.values
a = pd.Series([1, 2, 3, 4, 5], index = ['a', 'b', 'c', 'd', 'e'])
a
a.index
a_reindex = pd.Series(a, index = ['e', 'b', 'c', 'd', 'a']) 
a_reindex
a_reindex = pd.Series(a, index = ['e', 'b', 'c', 'd', 'a','f', 'g']) 
a_reindex
a
b= a.reindex(['e', 'b', 'c', 'd', 'a','f', 'g'])
b
a
a.rename(index={'a':'h','b':'i','c':'j','d':'k','e':'l'})
a
a.index = ['e', 'b', 'c', 'd', 'a']
a
a.index = ['e', 'b', 'c', 'd', 'a']
a
b = np.array(a)
b
c = pd.Series(b)
c
data = {'yuwen': 80, 'yingyu': 90, 'shuxue': 80}
data
type(data)
data_ = pd.Series(data,index = ['yingyu','yuwen','shuxue'])
data_1 = pd.Series(data)
data_
data_1
data_.index
data_.name = 'Score'
data_.index.name = 'Course'
data_
data_.index

2. DataFrame——索引index, columns,值values

data = np.array([[95, 96, 97], 
                [80, 85, 86], 
                [56, 65, 70]])
data
data1 = np.array(1)
data1
frame = pd.DataFrame(data)
frame
frame = pd.DataFrame(data, index=['xiaoming', 'xiaohong', 'xiaohei'],
                      columns=['yuwen', 'yingyu', 'shuxue'])
frame
frame_ = pd.DataFrame(frame, index=[ 'xiaohong', 'xiaoming','xiaohei'],
                      columns=['yingyu','yuwen',  'shuxue'])
frame_
frame__ = pd.DataFrame(frame, index=[ 'xiaohong', 'xiaoming','xiaohei','xiaobai'],
                      columns=['yingyu',
                      'yuwen', 'shuxue', 'tiyu'])
frame__
frame_.reindex(index=[ 'xiaohong', 'xiaoming','xiaohei','xiaobai'],
               columns=['yingyu','yuwen', 'shuxue', 'tiyu'])
frame_
frame_.rename(index={"xiaohong":"damao","xiaoming":"ermao","xiaohei":"Nicolas Cage"},
              columns={"yingyu":"English", "yuwen":"Literature", "shuxue":"Maths"})
frame_
frame_.index = ['damao','ermao','Nicolas Cage']
frame_.columns = ['English', 'Literature', 'Maths']
frame_
data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"]}
type(data)
df = pd.DataFrame(data)
df
df = pd.DataFrame(data, index = ["alpha", "beta","theta"])
df
df.index
df.columns
df.name = 'Score'
df.index.name = 'Person'
df.columns.name = 'Course'
df

小结:

1.Series, DataFrame 结构

2.指定或修改索引方法

index,columns 指定索引,已经有索引可以按索引重新排序

reindex 通过reindex方法,重新建立索引或排序

rename 修改索引

Series.index = []

DataFrame.columns = []
df.info()
str(df)

备注:

  1. 元组,一种固定长度的,不可变的python对象序列

  2. 列表,长度可变的,内容可修改的序列

  3. ndarray,高效多维同类数据容器,提供便捷的算数操作及广播功能

  4. Dataframe, 异质性矩阵表,每一列(columns)可以是不同的值类型

三、Series, DataFrame 运算

1. 基本运算

s1 = pd.Series([1, 2, 3],
              index = ['a','b','c'])
s1
s1 - 1
s2 = pd.Series([4, 5, 6],
              index = ['b','c','e'])
s2
s1 + s2
data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"]}
df = pd.DataFrame(data,index = ["alpha", "beta","theta"])
df
df * 2
data1 = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],}
df1 = pd.DataFrame(data1,index = ["alpha", "beta","theta"])
df1
df + df1
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_
df1
add_
df1 + add_
add1_ = {'alpha':10,'beta':10,'theta':20,}
add1_ = pd.Series(add1_)
add1_
df1+add1_
df1
df1.add(add1_,axis='index')

2. 矩阵运算、通用函数运算

df
df.T
df
np.square(df)
np.square(df1)

3. 基本统计方法

df.max(axis=0)
df.mean(axis=1)
df.describe()
df.info

四、Series, DataFrame 索引与切片

1. Series 索引

add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_
add_['Maths']
add_['Maths':'Literature']
add_[2]
add_[:2]
add_[[True, False, True, False]]
add_.English

2. DataFrame 索引

2.1 通过索引名称进行索引

df.dtypes
df
df['Maths']
df[['Maths','English']]
df.Maths
df['alpha']
df.loc['alpha']
df.loc['alpha':'theta']
df.alpha

2.2 通过数字进行索引

df.iloc[2]
df.iloc[:,2]
df.iloc[:2,:2]
df[:2]
df[2]
df[:2,:2]

2.3 通过布尔值索引

df1 >70
df1[df1>70]
df1[df1['Maths']>70]
df1['Maths']>70
df1[df1['Maths']>70] = 70
df1

五、Series, DataFrame 删除操作

1. Series 删除操作

add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_
add_.pop('Maths')
add_
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_
add_.drop('Maths')
add_
add_.drop('Maths',inplace=True)
add_
del add_['English']
add_

2. DataFrame 的删除

data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"]}
df = pd.DataFrame(data,index = ["alpha", "beta","theta"])
df
df.pop("Music")
df
df.drop('alpha')
df.drop('Maths',axis=1)
df
del df['Maths']
df
del df.loc['alpha']

六. Series, DataFrame 合并操作

1. Series 合并操作

s1 = pd.Series([1, 2, 3],
              index = ['a','b','c'])
s1
s2 = pd.Series([4, 5, 6],
              index = ['b','c','e'])
s2
pd.concat((s1,s2))
pd.concat((s1,s2),axis =1)
s1.combine_first(s2)

2. DataFrame 合并操作

data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"]}
df = pd.DataFrame(data,index = ["alpha", "beta","theta"])
df
data1 = {"English":[80,70,60], 
        "Maths":[80,90,50],
        "Literature":[70,70,85],}
df1 = pd.DataFrame(data1,index = ["beta","alpha","theta"])
df1
pd.concat((df,df1))
pd.concat((df,df1),axis=1)
df1.combine_first(df)
data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"],
        "ID":[1001,1002,1003]}
df = pd.DataFrame(data)
df
data1 = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "ID":[1004,1002,1003]}
df1 = pd.DataFrame(data1)
df1
pd.concat((df,df1),axis=1)
pd.merge(df,df1,on='ID')
pd.merge(df,df1,on='ID',how="outer")
df.set_index('ID', inplace=True)
df
df1.set_index('ID', inplace=True)
df1
df.join(df1, how='outer', lsuffix='df', rsuffix='df1')

七. Pandas 库其他常用函数或方法

df3 = pd.concat((df,df1),axis=0)
df3
df3.head()
df3.info()
df3.describe()
df3.sort_index(axis=0)
df3.sort_values(by=['Maths'])
df3.index.is_unique
df3['English'].is_unique
df3.index.value_counts()
df3.Music.value_counts()
df3['Maths'].rank()
df3['Maths'].rank(method = 'first')

总结

一、Pandas库

二、Pandas库数据结构——Series, DataFrame

1.Series——索引 index,值 values

2.DataFrame——索引index, columns,值 values

指定或修改索引方法

创建时:index, columns 指定索引,已经有索引可以按索引重新排序

创建后:

reindex方法,重新建立索引或指定索引排序

rename 修改索引

Series.index = []
DataFrame.columns = []

三、Series, DataFrame运算

1.基本运算

按照索引位置进行计算

DataFrame、Series “相加”时,按照DF的columns进行匹配

2.矩阵运算、通用函数

3.基本统计方法 axis指定操作轴

四、Series, DataFrame 索引与切片

1.Series 索引与切片 Index索引/数字索引/布尔值索引

2.DataFrame 索引与切片

Index索引 列:df['Maths']   行:df.loc[‘alpha’]

数字索引    df.iloc[]   特别的行可以直接用数字切片索引

布尔值索引

五、Series, DataFrame 删除操作

1.Series删除操作 pop/drop/del

2.DataFrame删除操作 pop/drop/del

六、Series, DataFrame 合并操作

1.Series合并操作

pd.concat() combine_first()

2.DataFrame合并操作

pd.concat() combine_first() 

pd.merge()  join()

七、Pandas库其他常用函数或方法

head()  info()  describe()  

sort_index()    sort_values()

is_unique   value_counts()

rank()
上一篇下一篇

猜你喜欢

热点阅读