Lambda functions are often a convenient way to write throw-away functions on the fly. If you need to write a more complicated function you may still need to use the more formal def
method, but lambda functions provide a quick and concise way to write functions.
You will be able to:
- Describe the purpose of lambda functions, when they should be employed, and their limitations
- Create lambda functions to use as arguments of other functions
- Use the
.map()
or.apply()
method to apply a function to a pandas series or DataFrame
Let's say you want to count the number of words in each yelp review.
import pandas as pd
df = pd.read_csv('Yelp_Reviews.csv', index_col=0)
df.head(2)
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
business_id | cool | date | funny | review_id | stars | text | useful | user_id | |
---|---|---|---|---|---|---|---|---|---|
1 | pomGBqfbxcqPv14c3XH-ZQ | 0 | 2012-11-13 | 0 | dDl8zu1vWPdKGihJrwQbpw | 5 | I love this place! My fiance And I go here atl... | 0 | msQe1u7Z_XuqjGoqhB0J5g |
2 | jtQARsP6P-LbkyjbO1qNGg | 1 | 2014-10-23 | 1 | LZp4UX5zK3e-c5ZGSeo3kA | 1 | Terrible. Dry corn bread. Rib tips were all fa... | 3 | msQe1u7Z_XuqjGoqhB0J5g |
df['text'].map(lambda x: len(x.split())).head()
1 58
2 30
4 30
5 82
10 32
Name: text, dtype: int64
Similar to defining functions in general or naming the iterable in for
loops, the variable that you use after calling the lambda
keyword does not matter:
df['text'].map(lambda review_text: len(review_text.split())).head()
1 58
2 30
4 30
5 82
10 32
Name: text, dtype: int64
Lambda functions can also accept some conditionals if chained in a list comprehension
df['text'].map(lambda x: 'Good' if any([word in x.lower() for word in ['awesome', 'love', 'good', 'great']]) else 'Bad').head()
1 Good
2 Bad
4 Good
5 Bad
10 Bad
Name: text, dtype: object
The above is terribly poor style and does in no way represent PEP 8 or Pythonic style. (For example, no line should be over 72 characters according to PEP 8; the previous line was 127 characters.) That said, it is an interesting demonstration of chaining a conditional, any method, and a list comprehension all inside a lambda function!
Shew!
Perhaps we want to naively select the year from the date string rather than convert it to a datetime object.
df.date.map(lambda x: x[:4]).head()
1 2012
2 2014
4 2014
5 2011
10 2016
Name: date, dtype: object
# Without a key
names = ['Miriam Marks','Sidney Baird','Elaine Barrera','Eddie Reeves','Marley Beard',
'Jaiden Liu','Bethany Martin','Stephen Rios','Audrey Mayer','Kameron Davidson',
'Carter Wong','Teagan Bennett']
sorted(names)
['Audrey Mayer',
'Bethany Martin',
'Carter Wong',
'Eddie Reeves',
'Elaine Barrera',
'Jaiden Liu',
'Kameron Davidson',
'Marley Beard',
'Miriam Marks',
'Sidney Baird',
'Stephen Rios',
'Teagan Bennett']
# Sorting by last name
names = ['Miriam Marks','Sidney Baird','Elaine Barrera','Eddie Reeves','Marley Beard',
'Jaiden Liu','Bethany Martin','Stephen Rios','Audrey Mayer','Kameron Davidson',
'Teagan Bennett']
sorted(names, key=lambda x: x.split()[1])
['Sidney Baird',
'Elaine Barrera',
'Marley Beard',
'Teagan Bennett',
'Kameron Davidson',
'Jaiden Liu',
'Miriam Marks',
'Bethany Martin',
'Audrey Mayer',
'Eddie Reeves',
'Stephen Rios']
Above, we've covered a lot of the syntax of lambda functions, but the thought process for writing these complex transformations was not transparent. Let's take a minute to discuss some approaches to tackling these problems.
Before trying to write a function to apply to an entire series, it's typically easier to attempt to solve for an individual case. For example, if we're trying to determine the number of words in a review, we can try and do this for a single review first.
First, choose an example field that you'll be applying the function to.
example = df['text'].iloc[0]
example
'I love this place! My fiance And I go here atleast once a week. The portions are huge! Food is amazing. I love their carne asada. They have great lunch specials... Leticia is super nice and cares about what you think of her restaurant. You have to try their cheese enchiladas too the sauce is different And amazing!!!'
Then start writing the function for that example. For example, if we need to count the number of words, it's natural to first divide the review into words. A natural way to do this is with the str.split() method.
example.split()
['I',
'love',
'this',
'place!',
'My',
'fiance',
'And',
'I',
'go',
'here',
'atleast',
'once',
'a',
'week.',
'The',
'portions',
'are',
'huge!',
'Food',
'is',
'amazing.',
'I',
'love',
'their',
'carne',
'asada.',
'They',
'have',
'great',
'lunch',
'specials...',
'Leticia',
'is',
'super',
'nice',
'and',
'cares',
'about',
'what',
'you',
'think',
'of',
'her',
'restaurant.',
'You',
'have',
'to',
'try',
'their',
'cheese',
'enchiladas',
'too',
'the',
'sauce',
'is',
'different',
'And',
'amazing!!!']
Then we just need to count this!
len(example.split())
58
df['text'].map(lambda x: len(x.split())).head()
1 58
2 30
4 30
5 82
10 32
Name: text, dtype: int64
When generalizing from a single case to all cases, it's important to consider exceptions or edge cases. For example, in the above example, you might wonder whether extra spaces or punctuations effects the output.
'this is a weird test!!!Can we break it??'.split()
['this', 'is', 'a', 'weird', 'test!!!Can', 'we', 'break', 'it??']
As you can see, extra spaces won't break our function, but missing a space after punctuation will. Perhaps this is a rare enough event that we don't worry further, but exceptions are always something to consider when writing functions.
Another common pattern that you may find very useful is the modulus or remainder operator (%), as well as the floor division operator (//). These are both very useful when you want behavior such as 'every fourth element' or 'groups of three consecutive elements'. Let's investigate a couple of examples.
Useful for queries such as 'every other element' or 'every fifth element' etc.
# Try a single example
3%2
1
2%2
0
# Generalize the pattern: every other
for i in range(10):
print('i: {}, i%2: {}'.format(i, i%2))
i: 0, i%2: 0
i: 1, i%2: 1
i: 2, i%2: 0
i: 3, i%2: 1
i: 4, i%2: 0
i: 5, i%2: 1
i: 6, i%2: 0
i: 7, i%2: 1
i: 8, i%2: 0
i: 9, i%2: 1
Useful for creating groups of a set size. For example: groups of ten, groups of seven, etc.
# Try a single example
9//3
3
5//3
1
# Generalize the pattern: every other
for i in range(10):
print('i: {}, i//2: {}'.format(i, i//3))
i: 0, i//2: 0
i: 1, i//2: 0
i: 2, i//2: 0
i: 3, i//2: 1
i: 4, i//2: 1
i: 5, i//2: 1
i: 6, i//2: 2
i: 7, i//2: 2
i: 8, i//2: 2
i: 9, i//2: 3
Combining the two can be very useful, such as when creating subplots! Below we iterate through 12 elements arranging them into 3 rows and 4 columns.
for i in range(12):
print('i: {}, Row: {} Column: {}'.format(i, i//4, i%4))
i: 0, Row: 0 Column: 0
i: 1, Row: 0 Column: 1
i: 2, Row: 0 Column: 2
i: 3, Row: 0 Column: 3
i: 4, Row: 1 Column: 0
i: 5, Row: 1 Column: 1
i: 6, Row: 1 Column: 2
i: 7, Row: 1 Column: 3
i: 8, Row: 2 Column: 0
i: 9, Row: 2 Column: 1
i: 10, Row: 2 Column: 2
i: 11, Row: 2 Column: 3
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(10,10))
x = np.linspace(start=-10, stop=10, num=10*83)
for i in range(12):
row = i//4
col = i%4
ax = axes[row, col]
ax.scatter(x, x**i)
ax.set_title('Plot of x^{}'.format(i))
plt.show()
Lambda functions can be a convenient way to write "throw away" functions that you want to declare inline. In the next lesson we'll give you some practice with creating them!