Pandas Basic (Part I: Indexing, Selecting & Assigning)
Opened this issue · 1 comments
8bitzz commented
Indexing, Selecting, Assigning
Choosing between loc and iloc
iloc
uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10 will select entries 0,...,9.loc
, meanwhile, indexes inclusively. So 0:10 will select entries 0,...,10- This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000.
- In this case
df.iloc[0:1000]
will return 1000 entries, whiledf.loc[0:1000]
return 1001 of them! - To get 1000 elements using loc, you will need to go one lower and ask for
df.loc[0:999]
Manipulating the index
- The set_index() method can be used to manipulate the index in any way we see fit
reviews.set_index("title")
Assigning data with a constant value or with an iterable of values
reviews['critic'] = 'everyone'
reviews['index_backwards'] = range(len(reviews), 0, -1)
Example
- Select the description column from reviews and assign the result to the variable desc
desc = reviews.description
desc = reviews['description]
- Select the first value from the description column of reviews, assigning it to variable first_description
first_description = reviews.description[0]
first_description = reviews.description.loc[0]
first_description = reviews.description.iloc[0]
- Select the first row of data (the first record) from reviews, assigning it to the variable first_row
first_row = reviews.loc[0]
first_row = reviews.iloc[0]
- Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions
first_descriptions = reviews.loc[:9, 'description']
first_descriptions = reviews.description.iloc[:10]
first_descriptions = desc.head(10)
- Create a variable df containing the country, province, region_1, and region_2 columns of the records with the index labels 0, 1, 10, and 100
indices = [0,1,10,100]
cols = ['country', 'province', 'region_1', 'region_2']
df = reviews.loc[indices, cols]
- Create a DataFrame top_oceania_wines containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand
top_oceania_wines = reviews.loc[(reviews.points >= 95) & (reviews.country.isin(['Australia', 'New Zealand']))]
8bitzz commented
Indexing, Selecting, Assigning
Choosing between loc and iloc
iloc
uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10 will select entries 0,...,9.loc
, meanwhile, indexes inclusively. So 0:10 will select entries 0,...,10- This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000.
- In this case
df.iloc[0:1000]
will return 1000 entries, whiledf.loc[0:1000]
return 1001 of them! - To get 1000 elements using loc, you will need to go one lower and ask for
df.loc[0:999]
Manipulating the index
- The set_index() method can be used to manipulate the index in any way we see fit
reviews.set_index("title")
Assigning data with a constant value or with an iterable of values
reviews['critic'] = 'everyone'
reviews['index_backwards'] = range(len(reviews), 0, -1)
Example
- Select the description column from reviews and assign the result to the variable desc
desc = reviews.description
desc = reviews['description]
- Select the first value from the description column of reviews, assigning it to variable first_description
first_description = reviews.description[0]
first_description = reviews.description.loc[0]
first_description = reviews.description.iloc[0]
- Select the first row of data (the first record) from reviews, assigning it to the variable first_row
first_row = reviews.loc[0]
first_row = reviews.iloc[0]
- Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions
first_descriptions = reviews.loc[:9, 'description']
first_descriptions = reviews.description.iloc[:10]
first_descriptions = desc.head(10)
- Create a variable df containing the country, province, region_1, and region_2 columns of the records with the index labels 0, 1, 10, and 100
indices = [0,1,10,100]
cols = ['country', 'province', 'region_1', 'region_2']
df = reviews.loc[indices, cols]
- Create a DataFrame top_oceania_wines containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand
top_oceania_wines = reviews.loc[(reviews.points >= 95) & (reviews.country.isin(['Australia', 'New Zealand']))]