Pandas笔记

2019-03-20  本文已影响0人  生煎小包

Python Data Anlysis Notebook

SublimeText File

Data Frame

Iterators

 #iterating over iterables
 word='DA'
 it=iter(word)
 next(it)
 print(*it) #->once per time

 zip() #-> return an iterator of tuples
 list(zip(lst1,lst2))
 print(*zipvarable)

Importing Data in python

 file_name=open('file.txt',mode='r') #only to read
 #'w' is to write
 text=file_name.read()
 file_name.close()
 with open('datacamp.csv','r') as datacamp #context manager
 file_objeect.readline()
 ! ls #will display the contents of your current directory.

enumerate #返回数据和其位置

 test=[1,2,3,4,5]
 for i, num in enumerate(test):
 print (i,m)

Clean Data

1. Sublime笔记整理

    # Print the head of df
    print(df.head())
    # Print the tail of df
    print(df.tail())
    # Print the shape of df
    print(df.shape)
    # Print the columns of df
    print(df.columns)
    # Rename the columns
    gapminder_melt.columns = ['country', 'year', 'life_expectancy']
    # Print the info of df,provides important information about a DataFrame
    print(df.info())
    # Print the value_counts for 'State'
    print(df.State.value_counts(dropna=False))
    # Print the value counts for 'Site Fill'
    print(df['Site Fill'].value_counts(dropna=False))

2. 函数

df.sample() 用法参考

 df.sample(frac=0.5,replace=True,random_state=123) #按比例抽取,是否有序放回,设置随机种子

df.isnull()

 df.isnull().values.any() #.values返回array,.any()返回True False
 df.isnull().sum() #sum null

df.groupby()

    #groupby后面聚类算法__猜测只能为num__
    df.groupby('col_name').sum()

Pandas Foundations

1. Inspectiong Data

  import pandas as pd
  type(df)
  type(df.columns)
  type(df.index)
  df.describe()
  df.shape
  df.columns
  df.index
  df.iloc[:5,:]
    df.corr() #only numbers
  # broadcasting, assigning scala value to col slice broadcasts value to each rows
  df.iloc[::3,-1]=np.nan
  #Series
  low=df['Low']
  type(low)
  low.head()
  lows=low.values
  type(lows)
  #View the first few and last few rows of a DataFrame
  df.head()
  df.tail()

count values

 df['col_name'].value_counts()
 df.col_name.value_counts()

show unique value

 df['col_name'].unique()
 df.col_name.unique()

2. Numpy and Pandas Together

  import numpy as np
  #.values to represent a DataFrame df as a NumPy array.
  np_vals = df.values
  # np.log10() method to compute the base 10 logarithm
  np_vals_log10 = np.log10(np_vals)
  df_log10 = np.log10(df)
  [print(x, 'has type', type(eval(x))) for x in ['np_vals', 'np_vals_log10', 'df', 'df_log10']]

3. Zip list to build a df

  #Zip the 2 lists together into one list of (key,value) tuples: zipped
  zipped = list(zip(list_keys,list_values))
  print(zipped)
  data = dict(zipped)
  df = pd.DataFrame(data)
  print(df)

4. Reading & Saving

5. Plot

6. Statistical exploratory data analysis

7. Time Series

Input Description
'min','T' minute
'H' hour
'D' day
'B' business day
'W' week
'M' month
'Q' quarter
'A' year

8. 防盗

Seaborn 参考

Practices

上一篇下一篇

猜你喜欢

热点阅读