pandas二刷(2)

2023-02-12  本文已影响0人  山猪打不过家猪

1.8字符串处理

  1. 使用方法:先获取Series的str属性,然后在属性上调用函数;
  2. 只能在字符串列上使用,不能数字列上使用;
  3. Dataframe上没有str属性和处理方法
  4. Series.str并不是Python原生字符串,而是自己的一套方法,不过大部分和原生str很相似;
1.8.1 startwith开头

查询字符串以2018-03开头的,等同于查询月份数据

condition = df["ymd"].str.startswith("2018-03")
1.8.2 链式调用清洗字符串

将字符串2018-01-02转换成为201801

df["ymd"].str.replace("-", "").str[0:6]
1.8.3 正则表达式
contains
df[df['tianqi'].str.contains("晴")]
re表达式
df['ymd'] = df['ymd'].str.replace(re.compile("-"), "")
image.png

1.10 Merge Syntax

Merge相当于sql的Join语法,将不同的Key关联到一个表

df_ratings = pd.read_csv(
    "./datas/movielens-1m/ratings.dat", 
    sep="::",
    engine='python', 
    names="UserID::MovieID::Rating::Timestamp".split("::")
)
image.png
df_users = pd.read_csv(
    "./datas/movielens-1m/users.dat", 
    sep="::",
    engine='python', 
    names="UserID::Gender::Age::Occupation::Zip-code".split("::")
)
image.png
df_movies = pd.read_csv(
    "./datas/movielens-1m/movies.dat", 
    sep="::",
    engine='python', 
    names="MovieID::Title::Genres".split("::")
)
image.png
1.10.1 inner join
df_ratings_users = pd.merge(
   df_ratings, df_users, left_on="UserID", right_on="UserID", how="inner"
)
select * from df_ratings a inner join df_users  b on a.UserID= b.UserID
image.png
1.10.2 right join
df_ratings_users = pd.merge(
   df_ratings, df_users, left_on="UserID", right_on="UserID", how="left "
)
select * from df_ratings a left join df_users  b on a.UserID= b.UserID

1.11 Concat

concat is used to combine excel with the same format

1.11.1 use pandas.concat to combine data
pd.concat([table1,table2])
image.png
image.png
pd.concat([df1,df2], ignore_index=True)
image.png

1.12 Batch merge and split excel

1.12.1 split to multiple equal excel
work_dir="./course_datas/c15_excel_split_merge"
splits_dir=f"{work_dir}/splits"

import os
if not os.path.exists(splits_dir):
    os.mkdir(splits_dir)
上一篇下一篇

猜你喜欢

热点阅读