程序员

Pandas技巧之Series转换至DataFrame

2019-04-28  本文已影响2人  周军科

写在前面

在实际工作中,若遇到以下情况,则必须要进行series和dataframe的转换。

sns.catplot(
    ['x=None', 'y=None', 'hue=None', 'data=None', 'row=None', 'col=None', 'col_wrap=None', 'estimator=<function mean at 0x00000166A7F95EA0>', 'ci=95', 'n_boot=1000', 'units=None', 'order=None', 'hue_order=None', 'row_order=None', 'col_order=None', "kind='strip'", 'height=5', 'aspect=1', 'orient=None', 'color=None', 'palette=None', 'legend=True', 'legend_out=True', 'sharex=True', 'sharey=True', 'margin_titles=False', 'facet_kws=None', '**kwargs'],
)

#其中data必须为data : DataFrame
# Long-form (tidy) dataset for plotting. Each column should correspond
# to a variable, and each row should correspond to an observation.
df.groupby(
    ['by=None', 'axis=0', 'level=None', 'as_index=True', 'sort=True', 'group_keys=True', 'squeeze=False', 'observed=False', '**kwargs'],
)
#Group series using mapper (dict or key function, apply given function
#to group, return result as series) or by a series of columns.

Series -> DataFrame 转换方法

group.reset_index(level=None, drop=False, name=None, inplace=False)
#Generate a new DataFrame or Series with the index reset.

可以看出,通过上述操作可以完成series向dataframe的转换。
特别是在进行多个列的聚合操作时,此方法非常管用。


示例

import pandas as pd
df =pd.read_csv('d:/out.csv',encoding = 'gbk',dtype=str)
df.dropna(subset=['Lon', 'Lon','ECI','Cell ID'],inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 18333 entries, 153 to 19082
Data columns (total 29 columns):
NO.            18333 non-null object
UETime         18333 non-null object
PCTime         18333 non-null object
Lon            18333 non-null object
Lat            18333 non-null object
ECI            18333 non-null object
Cell ID        18333 non-null object
earfcn_1       18317 non-null object
pci_1          18317 non-null object
rsrp_1         18317 non-null object
earfcn_2       17533 non-null object
pci_2          17533 non-null object
rsrp_2         17533 non-null object
earfcn_3       17433 non-null object
pci_3          17433 non-null object
rsrp_3         17433 non-null object
earfcn_4       15276 non-null object
pci_4          15276 non-null object
rsrp_4         15276 non-null object
earfcn_5       14829 non-null object
pci_5          14829 non-null object
rsrp_5         14829 non-null object
earfcn_6       12166 non-null object
pci_6          12166 non-null object
rsrp_6         12166 non-null object
is_overlap     18333 non-null object
overlap_pci    18333 non-null object
is_mod         18333 non-null object
mod_pci        18333 non-null object
dtypes: object(29)
memory usage: 4.2+ MB

从上图可以看出,pandas读入的数据为dataframe类型。

type(df.groupby(['Cell ID','is_overlap'])['is_overlap'].count())
pandas.core.series.Series

从上图中可以看出,聚合后的数据为series类型。

group.reset_index(name='count')

开始绘图

 sns.catplot(x='Cell ID',y='count',data=new_group,kind='bar',hue='is_overlap',aspect=15)
上一篇 下一篇

猜你喜欢

热点阅读