pandas二刷（2）

2023-02-12 本文已影响0人山猪打不过家猪

1.8字符串处理

使用方法：先获取Series的str属性，然后在属性上调用函数；
只能在字符串列上使用，不能数字列上使用；
Dataframe上没有str属性和处理方法
Series.str并不是Python原生字符串，而是自己的一套方法，不过大部分和原生str很相似；

1.8.1 startwith开头

查询字符串以2018-03开头的，等同于查询月份数据

condition = df["ymd"].str.startswith("2018-03")

1.8.2 链式调用清洗字符串

将字符串2018-01-02转换成为201801

df["ymd"].str.replace("-", "").str[0:6]

1.8.3 正则表达式

contains

找出有晴天的所有数据

df[df['tianqi'].str.contains("晴")]

re表达式

提取所有数字

df['ymd'] = df['ymd'].str.replace(re.compile("-"), "")

image.png

1.10 Merge Syntax

Merge相当于sql的Join语法，将不同的Key关联到一个表

Table 1：

df_ratings = pd.read_csv(
    "./datas/movielens-1m/ratings.dat", 
    sep="::",
    engine='python', 
    names="UserID::MovieID::Rating::Timestamp".split("::")
)

image.png

Table 2：

df_users = pd.read_csv(
    "./datas/movielens-1m/users.dat", 
    sep="::",
    engine='python', 
    names="UserID::Gender::Age::Occupation::Zip-code".split("::")
)

image.png

Table 3：

df_movies = pd.read_csv(
    "./datas/movielens-1m/movies.dat", 
    sep="::",
    engine='python', 
    names="MovieID::Title::Genres".split("::")
)

image.png

1.10.1 inner join

All entries from the left side will appear in the result, and if there's no match from the right side, it will be shown as Null.

df_ratings_users = pd.merge(
   df_ratings, df_users, left_on="UserID", right_on="UserID", how="inner"
)

equal to sql

select * from df_ratings a inner join df_users  b on a.UserID= b.UserID

image.png

1.10.2 right join

All entries from the left side will appear in the result, and if there's no match from the right side, it will be shown as Null

df_ratings_users = pd.merge(
   df_ratings, df_users, left_on="UserID", right_on="UserID", how="left "
)

equal to sql

select * from df_ratings a left join df_users  b on a.UserID= b.UserID

1.11 Concat

concat is used to combine excel with the same format

table 1

image.png
table 2

image.png

1.11.1 use pandas.concat to combine data

use default param

pd.concat([table1,table2])

image.png

user ignore_index =True to neglect the primary index

pd.concat([df1,df2], ignore_index=True)

image.png

1.12 Batch merge and split excel

1.12.1 split to multiple equal excel

work_dir="./course_datas/c15_excel_split_merge"
splits_dir=f"{work_dir}/splits"

import os
if not os.path.exists(splits_dir):
    os.mkdir(splits_dir)

pandas二刷（2）

1.8字符串处理

1.8.1 startwith开头

1.8.2 链式调用清洗字符串

1.8.3 正则表达式

contains

re表达式

1.10 Merge Syntax

1.10.1 inner join

1.10.2 right join

1.11 Concat

1.11.1 use pandas.concat to combine data

1.12 Batch merge and split excel

1.12.1 split to multiple equal excel

猜你喜欢

热点阅读