Pandas Basic (Part I: Indexing, Selecting & Assigning)

Question

Opened this issue 4 years ago · 1 comments

iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10 will select entries 0,...,9.
loc, meanwhile, indexes inclusively. So 0:10 will select entries 0,...,10
This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000.
In this case df.iloc[0:1000] will return 1000 entries, while df.loc[0:1000] return 1001 of them!
To get 1000 elements using loc, you will need to go one lower and ask for df.loc[0:999]

The set_index() method can be used to manipulate the index in any way we see fit

reviews.set_index("title")

reviews['critic'] = 'everyone'
reviews['index_backwards'] = range(len(reviews), 0, -1)

Select the description column from reviews and assign the result to the variable desc

desc = reviews.description
desc = reviews['description]

Select the first value from the description column of reviews, assigning it to variable first_description

first_description = reviews.description[0]
first_description = reviews.description.loc[0]
first_description = reviews.description.iloc[0]

Select the first row of data (the first record) from reviews, assigning it to the variable first_row

first_row = reviews.loc[0]
first_row = reviews.iloc[0]

Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions

first_descriptions = reviews.loc[:9, 'description']
first_descriptions = reviews.description.iloc[:10]
first_descriptions = desc.head(10)

Create a variable df containing the country, province, region_1, and region_2 columns of the records with the index labels 0, 1, 10, and 100

indices = [0,1,10,100]
cols = ['country', 'province', 'region_1', 'region_2']
df = reviews.loc[indices, cols]

Create a DataFrame top_oceania_wines containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand

top_oceania_wines = reviews.loc[(reviews.points >= 95) & (reviews.country.isin(['Australia', 'New Zealand']))]

Answer 1 · 2021-10-21T02:58:02.000Z

iloc uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So 0:10 will select entries 0,...,9.
loc, meanwhile, indexes inclusively. So 0:10 will select entries 0,...,10
This is particularly confusing when the DataFrame index is a simple numerical list, e.g. 0,...,1000.
In this case df.iloc[0:1000] will return 1000 entries, while df.loc[0:1000] return 1001 of them!
To get 1000 elements using loc, you will need to go one lower and ask for df.loc[0:999]

The set_index() method can be used to manipulate the index in any way we see fit

reviews.set_index("title")

reviews['critic'] = 'everyone'
reviews['index_backwards'] = range(len(reviews), 0, -1)

Select the description column from reviews and assign the result to the variable desc

desc = reviews.description
desc = reviews['description]

Select the first value from the description column of reviews, assigning it to variable first_description

first_description = reviews.description[0]
first_description = reviews.description.loc[0]
first_description = reviews.description.iloc[0]

Select the first row of data (the first record) from reviews, assigning it to the variable first_row

first_row = reviews.loc[0]
first_row = reviews.iloc[0]

Select the first 10 values from the description column in reviews, assigning the result to variable first_descriptions

first_descriptions = reviews.loc[:9, 'description']
first_descriptions = reviews.description.iloc[:10]
first_descriptions = desc.head(10)

Create a variable df containing the country, province, region_1, and region_2 columns of the records with the index labels 0, 1, 10, and 100

indices = [0,1,10,100]
cols = ['country', 'province', 'region_1', 'region_2']
df = reviews.loc[indices, cols]

Create a DataFrame top_oceania_wines containing all reviews with at least 95 points (out of 100) for wines from Australia or New Zealand

top_oceania_wines = reviews.loc[(reviews.points >= 95) & (reviews.country.isin(['Australia', 'New Zealand']))]