第一门课程：Introduction to Data Scien

2018-01-03 本文已影响0人英天

第一周 Python Fundamentals

从字段中取出Christopher

x = 'Dr. Christopher Brooks'
print(x[4:15])

保留Dr.和last name. use function and map

people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    title = person.split()[0]
    lastname = person.split()[-1]
    return '{} {}'.format(title, lastname)

list(map(split_title_and_name, people))

list comparation

def times_tables():
    lst = []
    for i in range(10):
        for j in range (10):
            lst.append(i*j)
    return lst

times_tables() == [j*i for i in range(10) for j in range(10)]
#the last line has the same function as the first

第二周 Basic Data Processing with Pandas

The DataFrame Data Structure

形成一个表格

import pandas as pd
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})
df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])
df.head()

修改表格中某一列的数值

purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])


df['Cost'] *= 0.8
print(df)

读取CSV文件

import pandas as pd
df = pd.read_csv('olympics.csv')
df.head()

筛选出价格大于3的值

purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])


df['Name'][df['Cost']>3]

Missing value

read from csv

import pandas as pd
df = pd.read_csv('log.csv')
df

set time column as index and sort according to it

df = df.set_index('time')
df = df.sort_index()
df

set two index :time and user

df = df.reset_index()
df = df.set_index(['time', 'user'])
df

fill missing value

df = df.fillna(method='ffill')
df.head()

第一门课程：Introduction to Data Scien

第一周 Python Fundamentals

第二周 Basic Data Processing with Pandas

The DataFrame Data Structure

Missing value

第三周 advanced pandas

猜你喜欢

热点阅读