Pythoner集中营大数据 爬虫Python AI SqlPython精选

list comprehensions

2018-03-15  本文已影响5人  钊钖

Learn about list comprehensions and the None type while finding common names for U.S. legislators.

1. The Data Set

In the previous mission, we worked with legislators.csv, which contains information on every person who has served in the U.S. Congress. We cleaned up some missing data and added a column for birth year.

We'll continue to work with the same data set in this mission. Here's a preview of it in CSV format:


last_name,first_name,birthday,gender,type,state,party,birth_year
Bassett,Richard,1745-04-02,M,sen,DE,Anti-Administration,1745
Bland,Theodorick,1742-03-21,M,rep,VA,1742
Burke,Aedanus,1743-06-16,M,rep,SC,1743
Carroll,Daniel,1730-07-22,M,rep,MD,1730

In this mission, we'll use the data to find the most common names among U.S. legislators of each gender. Before diving into this, we'll explore some critical concepts, such as enumeration.


2. Enumerate

There are many situations where we'll need to iterate over multiple lists in tandem, such as this one:

animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
viciousness = [1, 5, 10, 10, 1]
for animal in animals:
    print("Animal")
    print(animal)
    print("Viciousness")

In the example above, we have two lists. The second list describes the viciousness of the animals in the first list. A Dog has a viciousness level of 1, and a SuperLion has a viciousness level of 10. We want to retrieve the position of the item in animals the loop is currently on, so we can use it to look up the corresponding value in the viciousness list.

Unfortunately, we can't just loop through animals, and then tap into the second list. Python has anenumerate() function that can help us with this, though. The enumerate()function allows us to have two variables in the body of a for loop -- an index, and the value.


for i ,animal in enumerate(animals):
    print('animal index')
    print(i)
    print('animal')
    print(animal)
    

On every iteration of the loop, the value for i will become the value of the index in animals that corresponds to that iteration. animal will take on the value in animalsthat corresponds to the index i.

Here's another example of how we can use the enumerate() function to iterate over multiple lists in tandem:

animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
viciousness = [1, 5, 10, 10, 1]
for i, animal in enumerate(animals):
    print("Animal")
    print(animal)
    print("Viciousness")
    print(viciousness[i])

In this example, we use the index variablei to index theviciousness list, and print the viciousness value that corresponds to the same index in animals.


ships = ["Andrea Doria", "Titanic", "Lusitania"]
cars = ["Ford Edsel", "Ford Pinto", "Yugo"]

for i ,ship in enumerate(ships):
    print(ship)
    print(cars[i])


3. Adding Columns

We can even use the enumerate() function to add columns to lists of lists. For example, here's some starter code:


door_count = [4, 4]
cars = [
        ["black", "honda", "accord"],
        ["red", "toyota", "corolla"]
       ]

We can add a column to cars by appending a value to each inner list:


for i ,car in enumerate(cars):
    car.append(door_count[i])

In the code above, we:

Let's reinforce what we've learned by completing an exercise.



things = [
    ["apple", "monkey"], 
          ["orange", "dog"], 
          ["banana", "cat"]]

trees = ["cedar", "maple", "fig"]

for i ,thing in enumerate(things):
    thing.append(trees[i])
    
print(things)

* * *

4. List Comprehensions

We've written many short for loops to manipulate lists. Here's an example:


animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]

animal_lengths = []
for animal in animals:
    animal_lengths.append(len(animal))

This comprehension consists of the list operation len(animal), the loop variable animal, and the list that we're iterating over, animals.

Logically, the list comprehension:

List comprehensions are much more compact notation, and can save space when you need to write multiple for loops.


apple_prices = [100, 101, 102, 105]

apple_prices_doubled = [price*2 for price in apple_prices]
apple_prices_lowered = [price-100  for price in apple_prices]
print(apple_prices_doubled)
print(apple_prices_lowered)


5. Counting Female Names

Let's count how many times each female first name occurs in legislators. To limit our count to names from the modern era, we'll only look at those that appear after 1940. While names like Theodorick were common prior to 1940, they're rare today.

Here's a preview of what this dictionary will look like:


{
    'Nancy': 1, 
    'Sandy': 1, 
    'Carolyn': 1, 
    'Melissa': 2, 
    'Jo Ann': 2,
    ...
}

Now, let's work on creating it!




import csv
legislators = list(csv.reader(open('legislators_add_year.csv',)))

name_counts ={}

for row in legislators:
    if row[3]=='F' and int(row[7])>=1940:
        name = row[1]
        if name in name_counts:
            name_counts[name] +=1
        else:
            name_counts[name]=1
name_counts

6 None

Let's say we're trying to find the maximum value in a list. We might write some code that looks like this:

values = [50,60,70]
max_value = 0
for i in values:
    if i> max_value:
        max_value =i

We setmax_value to a low value so that everything's greater than it. But what if we changed the values list slightly?


values = [-50, -80, -100]
max_value = 0
for i in values:
    if i > max_value:
        max_value = i

In the above scenario,max_value is 0 when the loop finishes. This is wrong, because 0 isn't in values; it's just a placeholder we used to initialize max_value.

We can resolve this kind of issue using the None object, which has a special data type called NoneType.

The Noneobject indicates that the variable has no value. Rather than using the normal double equals sign (==) to check whether a value equals None, we use the variable is None syntax.

The is comparison operator checks for object equality. Using is instead of == prevents some custom classes from resolving to True when compared with None. We'll explore how to use operators with the None object in greater depth during a later mission. For now, let's see what the variable is None syntax looks like:

values = [-50, -80, -100]
max_value = None
for i in values:
    if max_value is None or i > max_value:
        max_value = i

In the example above, we:


7. Comparing with None

Comparing a value to None will usually generate an error. This is actually helpful when we're writing code, because it prevents unexpected variables from being None. For example, this code will cause an error:


a = None
a > 10

Therefore, when a value could potentially be None, and we want to compare it to another value, we should always include code that checks whether it actually is Nonefirst.

We can use two Boolean statements joined by or to do this. Here's an example:

max_value is None or i > max_value

The Python interpreter will evaluate the two statements in order. If the first statement is True, it won't evaluate the second one. This saves time, since when one statement is True, the whole or conditional is True.

The following code will assign True tob ifais None, or if ais greater than 10:

a = None
b = a is None or a > 10

The same logic applies to an and statement. Because both conditions have to be True, if the first one isFalse, the Python interpreter won't evaluate the second one. The example below shows how to write an and statement involvingNonethat won't return an error. It will assign True to b if a does not equal None and a is greater than 10:


a = None
b = a is not None and a > 10

Let's give this a try in our next exercise!


values = [None, 10, 20, 30, None, 50]
checks = []

checks = [a is not None and a > 30 for a in values]
checks


8. Highest Female Name Count

name_counts is a dictionary where the keys are female first names from legislators, and the values are the number of times the names occured after 1940.

In order to extract the most common names from this dictionary, we need to determine the highest totals inname_counts. Once we know the totals, we can find the keys for them.

We can iterate through all of the keys in a dictionary like this:

fruits = {
        "apple": 2,
        "orange": 5,
        "melon": 10
    }

for fruit in fruits:
    rating = fruits[fruit]

In the loop above, we iterate through each key in fruits. We can access the corresponding value using fruits[fruit].

Let's identify the highest totals in the next exercise.


max_value =None
for name_count in name_counts:
    count =  name_counts[name_count]
    if max_value is None or count> max_value:
        max_value = count
        
max_value



9. The Items Method

The code we used on the previous screen to access the keys and values in a dictionary was slightly awkward. We can simplify this process with the items() method, which allows us to iterate through keys and values at the same time.


fruits = {
    "apple": 2,
    "orange": 5,
    "melon": 10
}

for fruit, rating in fruits.items():
    print(rating)

The items() method makes our code clearer and more compact.



plant_types = {"orchid": "flower", "cedar": "tree", "maple": "tree"}

for plant ,types in plant_types.items():
    print (types)
    print(plant)
    print (plant + '!')

10 Finding the Most Common Female Names

As we learned on a previous screen, the most common female names occur two times in name_counts. Therefore, we want to extract any keys in name_counts that have the value 2.


######method 1
top_female_names =[]

for names,counts in name_counts.items():
    if counts==2:
        top_female_names.append(names)
print(top_female_names)        

#############method 2
top_female_names_1 =[]
top_female_names_1=[names 
                    for names,counts_1 
                    in name_counts.items() 
                    if counts_1  ==2
                   ]

print (top_female_names)

11. Finding the Most Common Male Names

Now that we know how to find the most common female names, we can repeat the same process for male names.


male_name_counts ={}
top_male_names =[]

for row in legislators:
    if row[3] == 'M' and int(row[7]) > 1940:
        name =row[1]
        if name in male_name_counts:
            male_name_counts[name] +=1
        else:
            male_name_counts[name] =1
            
highest_male_count = None
for name ,count in male_name_counts.items():
    if highest_male_count is  None or  count > highest_male_count:
        highest_male_count = count
        
for name, count in male_name_counts.items():
    if count == highest_male_count:
        top_male_names.append(name)
        
print (male_name_counts)
print(top_male_names)

上一篇 下一篇

猜你喜欢

热点阅读