Udacity_数据分析之用Numpy和Pandas分析一维数组

2017-09-11 本文已影响31人平平又无奇

找出数组中的最大值的位置

导入 countries 和 employment 数组

import numpy as np
countries = np.array(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
    'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas',
    'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
    'Belize', 'Benin', 'Bhutan', 'Bolivia',
    'Bosnia and Herzegovina'])
employment = np.array([
    55.70000076,  51.40000153,  50.5       ,  75.69999695,
    58.40000153,  40.09999847,  61.5       ,  57.09999847,
    60.90000153,  66.59999847,  60.40000153,  68.09999847,
    66.90000153,  53.40000153,  48.59999847,  56.79999924,
    71.59999847,  58.40000153,  70.40000153,  41.20000076
])

定义函数

# 函数1
def max_employment(countries,employment):
    max_country = None
    max_employment = 0
    for i in range(len(countries)):
        country = countries[i]
        country_employment = employment[i]
        if country_employment > max_employment:
            max_employment = country_employment
            max_country = country
    return (max_country,max_employment)

# 函数2
def max_employment2(countries,employment):
    i = employment.argmax()
    return (countries[i],employment[i])
max_employment(countries,employment)
max_employment2(countries,employment)

max_employment 和 max_employment2 输出结果都是：

('Angola', 75.699996949999999)

分割姓和名并重组顺序

import pandas as pd

# Change False to True to see what the following block of code does

# Example pandas apply() usage (although this could have been done
# without apply() using vectorized operations)
if False:
    s = pd.Series([1, 2, 3, 4, 5])
    def add_one(x):
        return x + 1
    print(s.apply(add_one))

names = pd.Series([
    'Andre Agassi',
    'Barry Bonds',
    'Christopher Columbus',
    'Daniel Defoe',
    'Emilio Estevez',
    'Fred Flintstone',
    'Greta Garbo',
    'Humbert Humbert',
    'Ivan Ilych',
    'James Joyce',
    'Keira Knightley',
    'Lois Lane',
    'Mike Myers',
    'Nick Nolte',
    'Ozzy Osbourne',
    'Pablo Picasso',
    'Quirinus Quirrell',
    'Rachael Ray',
    'Susan Sarandon',
    'Tina Turner',
    'Ugueth Urbina',
    'Vince Vaughn',
    'Woodrow Wilson',
    'Yoji Yamada',
    'Zinedine Zidane'
])
def reverse_name(name):
    split_name = name.split(" ")#以空格分开字符串
    first_name = split_name[0]
    last_name = split_name[1]
    return last_name+","+first_name
def reverse_names(names):
    '''
    Fill in this function to return a new series where each name
    in the input series has been transformed from the format
    "Firstname Lastname" to "Lastname, FirstName".
    
    Try to use the Pandas apply() function rather than a loop.
    '''
    return names.apply(reverse_name)
reverse_names(names)

输出：

0 Agassi,Andre
1 Bonds,Barry
2 Columbus,Christopher
3 Defoe,Daniel
4 Estevez,Emilio
5 Flintstone,Fred
6 Garbo,Greta
7 Humbert,Humbert
8 Ilych,Ivan
9 Joyce,James
10 Knightley,Keira
11 Lane,Lois
12 Myers,Mike
13 Nolte,Nick
14 Osbourne,Ozzy
15 Picasso,Pablo
16 Quirrell,Quirinus
17 Ray,Rachael
18 Sarandon,Susan
19 Turner,Tina
20 Urbina,Ugueth
21 Vaughn,Vince
22 Wilson,Woodrow
23 Yamada,Yoji
24 Zidane,Zinedine
dtype: object

有不同index的两个series相加并去掉NaN

import pandas as pd
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([10, 20, 30, 40], index=['c', 'd', 'e', 'f'])
print(s1 + s2)

两个series只有c、d重叠，相加之后非重部分用NaN填充，所以以上输出结果：

a NaN
b NaN
c 13.0
d 24.0
e NaN
f NaN
dtype: float64

改进代码：

import pandas as pd
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([10, 20, 30, 40], index=['c', 'd', 'e', 'f'])
print(s1.add(s2,fill_value=0))

输出为：

a 1.0
b 2.0
c 13.0
d 24.0
e 30.0
f 40.0
dtype: float64

Udacity_数据分析之用Numpy和Pandas分析一维数组

找出数组中的最大值的位置

分割姓和名并重组顺序

有不同index的两个series相加并去掉NaN

猜你喜欢

热点阅读