Pandas - 2. 抽取行列

2022-04-30 本文已影响0人陈天睡懒觉

import pandas as pd
df = pd.read_csv('data/gapminder.tsv',sep='\t')
print(df.head())

       country continent  year  lifeExp       pop   gdpPercap
0  Afghanistan      Asia  1952   28.801   8425333  779.445314
1  Afghanistan      Asia  1957   30.332   9240934  820.853030
2  Afghanistan      Asia  1962   31.997  10267083  853.100710
3  Afghanistan      Asia  1967   34.020  11537966  836.197138
4  Afghanistan      Asia  1972   36.088  13079460  739.981106

查看每一列的类型 df.dtypes或df.info()

object -- string -- 字符串
int64 -- int -- 整型
float64 -- float -- 浮点型
datetime64 -- datetime -- 时间

print(df.dtypes)

country       object
continent     object
year           int64
lifeExp      float64
pop            int64
gdpPercap    float64
dtype: object

查看行列信息

# df.shape shape是属性,加上括号会报错
print(df.shape) #(行数，列数)

(1704, 6)

获取列名和行索引

# df.columns (列名)
print(df.columns)
# df.index (行索引)
print(df.index)
print(list(df.index)[:10])

Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')
RangeIndex(start=0, stop=1704, step=1)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

获取列子集

# 单列
continent = df.continent #只适合英文列名
continent = df['continent']
print(continent[:5])
# 多列
year_continent = df[['year','continent']]
print(year_continent[:5])

0    Asia
1    Asia
2    Asia
3    Asia
4    Asia
Name: continent, dtype: object
   year continent
0  1952      Asia
1  1957      Asia
2  1962      Asia
3  1967      Asia
4  1972      Asia

获取行子集

通过行名(loc)
用过行号(iloc)

# 取一行
sample = df.loc[0] # 因为只取1行输出Series
print(sample)
# 取多行
samples = df.loc[[0,100,200]]
print(samples)
# df.loc[-1]会报错，因为没有-1这个标签的行

# 取一行
sample = df.iloc[0] # 因为只取1行输出Series

# 取多行
samples = df.iloc[[0,100,200]]

# iloc可以输入数值
sample = df.iloc[-1]

country      Afghanistan
continent           Asia
year                1952
lifeExp           28.801
pop              8425333
gdpPercap        779.445
Name: 0, dtype: object
          country continent  year  lifeExp       pop   gdpPercap
0     Afghanistan      Asia  1952   28.801   8425333  779.445314
100    Bangladesh      Asia  1972   45.252  70759295  630.233627
200  Burkina Faso    Africa  1992   50.260   8878303  931.752773

混合，抽取行列子集

iloc/loc[,] 逗号左边是行，右边是列

# 获取整列
subset = df.loc[:,['year','pop']]
subset = df.iloc[:,[1,3,-1]] # 可以指定具体位置的列
subset = df.iloc[:,3:6] 
subset = df.iloc[:,:3] 

# 多行多列
subset = df.loc[[1,10,20],['year','pop']]
subset = df.iloc[[1,10,20],[1,-1]]
print(subset)

   continent    gdpPercap
1       Asia   820.853030
10      Asia   726.734055
20    Europe  2497.437901

Pandas - 2. 抽取行列

查看每一列的类型 df.dtypes或df.info()

查看行列信息

获取列名和行索引

获取列子集

获取行子集

混合，抽取行列子集

猜你喜欢

热点阅读