DataFrame.count: Count number of non-NA/null observations.
DataFrame.max: Maximum of the values in the object.
DataFrame.min: Minimum of the values in the object.
DataFrame.mean: Mean of the values.
DataFrame.std: Standard deviation of the observations.
DataFrame.25: Lower percentile - 75% of the data is above this value.
DataFrame.75: Upper percentile - 25% of the data is above this value.
DataFrame.50: The 50 percentile is the same as the median.
Dataframe Correlation Matrix
Compute pairwise correlation of columns, excluding NA/null values. Generate values between -1 and 1 to represent the negative or positive correlation between two values:
wine_dataset_excel.corr()
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
fixed acidity
1.000000
-0.022697
0.289181
0.089021
0.023086
-0.049396
0.091070
0.265331
-0.425858
-0.017143
-0.120881
-0.113663
volatile acidity
-0.022697
1.000000
-0.149472
0.064286
0.070512
-0.097012
0.089261
0.027114
-0.031915
-0.035728
0.067718
-0.194723
citric acid
0.289181
-0.149472
1.000000
0.094212
0.114364
0.094077
0.121131
0.149503
-0.163748
0.062331
-0.075729
-0.009209
residual sugar
0.089021
0.064286
0.094212
1.000000
0.088685
0.299098
0.401439
0.838966
-0.194133
-0.026664
-0.450631
-0.097577
chlorides
0.023086
0.070512
0.114364
0.088685
1.000000
0.101392
0.198910
0.257211
-0.090439
0.016763
-0.360189
-0.209934
free sulfur dioxide
-0.049396
-0.097012
0.094077
0.299098
0.101392
1.000000
0.615501
0.294210
-0.000618
0.059217
-0.250104
0.008158
total sulfur dioxide
0.091070
0.089261
0.121131
0.401439
0.198910
0.615501
1.000000
0.529881
0.002321
0.134562
-0.448892
-0.174737
density
0.265331
0.027114
0.149503
0.838966
0.257211
0.294210
0.529881
1.000000
-0.093591
0.074493
-0.780138
-0.307123
pH
-0.425858
-0.031915
-0.163748
-0.194133
-0.090439
-0.000618
0.002321
-0.093591
1.000000
0.155951
0.121432
0.099427
sulphates
-0.017143
-0.035728
0.062331
-0.026664
0.016763
0.059217
0.134562
0.074493
0.155951
1.000000
-0.017433
0.053678
alcohol
-0.120881
0.067718
-0.075729
-0.450631
-0.360189
-0.250104
-0.448892
-0.780138
0.121432
-0.017433
1.000000
0.435575
quality
-0.113663
-0.194723
-0.009209
-0.097577
-0.209934
0.008158
-0.174737
-0.307123
0.099427
0.053678
0.435575
1.000000
Dataframes Columns
Value Counts for Categorical Values
Count the number of entries in a categorical column for each categroy:
wine_dataset_excel['quality'].value_counts()
# there are 5 wines with quality 9 but 2198 with quality 6
print(wine_dataset_excel['quality'].unique())
# array([6, 5, 7, 8, 4, 3, 9])# we don't have any wines with quality 1,2 or 10print(wine_dataset_excel['quality'].nunique())
# 7# in total there are 7 classes
Selecting Columns
# Select a column to return a PandaSeriestype(wine_dataset_excel['pH'])
wine_dataset_excel['pH']
0
3.00
1
3.30
2
3.26
3
3.19
4
3.19
...
4893
3.27
4894
3.15
4895
2.99
4896
3.34
4897
3.26
Name: pH, Length: 4898, dtype: float64
# select only wines with a pH of 2.72wine_dataset_excel[wine_dataset_excel['pH'] ==2.72]
# return max value in columnwine_dataset_excel['total sulfur dioxide'].max()
# 440.0
# return max value location in columnmax_value_location=wine_dataset_excel['total sulfur dioxide'].idxmax()
print(max_value_location)
# 4745wine_dataset_excel.iloc[max_value_location]
fixed acidity
6.10000
volatile acidity
0.26000
citric acid
0.25000
residual sugar
2.90000
chlorides
0.04700
free sulfur dioxide
289.00000
total sulfur dioxide
440.00000
density
0.99314
pH
3.44000
sulphates
0.64000
alcohol
10.50000
quality
3.00000
Name: 4745, dtype: float64
# return min value location in columnmin_value_location=wine_dataset_excel['total sulfur dioxide'].idxmin()
print(min_value_location)
# 3710wine_dataset_excel.iloc[min_value_location]
# resetting the index turns index into a regular column againwine_dataset_excel_indexed.reset_index()
Selecting Rows
# select first rows by integer locationwine_dataset_excel_indexed.iloc[0]
fixed acidity
7.000
volatile acidity
0.270
citric acid
0.360
residual sugar
20.700
chlorides
0.045
free sulfur dioxide
45.000
total sulfur dioxide
170.000
density
1.001
pH
3.000
sulphates
0.450
alcohol
8.800
quality
6.000
# select first row by labelled indexwine_dataset_excel_indexed.loc[1]
fixed acidity
7.000
volatile acidity
0.270
citric acid
0.360
residual sugar
20.700
chlorides
0.045
free sulfur dioxide
45.000
total sulfur dioxide
170.000
density
1.001
pH
3.000
sulphates
0.450
alcohol
8.800
quality
6.000
# select multiple rows by ilocwine_dataset_excel_indexed.iloc[:3]
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
1
7.0
0.27
0.36
20.7
0.045
45.0
170.0
1.0010
3.00
0.45
8.8
6
2
6.3
0.30
0.34
1.6
0.049
14.0
132.0
0.9940
3.30
0.49
9.5
6
3
8.1
0.28
0.40
6.9
0.050
30.0
97.0
0.9951
3.26
0.44
10.1
6
# select multiple rows by locwine_dataset_excel_indexed.loc[[111,222,333]]
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
111
6.5
0.170
0.54
8.5
0.082
64.0
163.0
0.9959
2.89
0.39
8.8
6
222
7.2
0.685
0.21
9.5
0.070
33.0
172.0
0.9971
3.00
0.55
9.1
6
333
6.3
0.230
0.30
1.8
0.033
16.0
91.0
0.9906
3.28
0.40
11.8
6
Removing Rows
wine_dataset_excel_indexed.drop(1, axis=0)
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
2
6.3
0.30
0.34
1.6
0.049
14.0
132.0
0.9940
3.30
0.49
9.5
6
3
8.1
0.28
0.40
6.9
0.050
30.0
97.0
0.9951
3.26
0.44
10.1
6
...
Inserting Rows
# copy a rowrow_copy=wine_dataset_excel_indexed.iloc[0]
wine_dataset_excel_indexed.append(row_copy)
# FutureWarning: The frame.append method is deprecated and will be removed# from pandas in a future version. Use pandas.concat instead.
# test for duplicated rowswine_dataset_excel.duplicated()
The 4th row contains a duplicated entry:
0
False
1
False
2
False
3
False
4
True
...
4893
False
4894
False
4895
False
4896
False
4897
False
Length: 4898, dtype: bool
# remove duplicateswine_dataset_excel_deduped=wine_dataset_excel.drop_duplicates()
wine_dataset_excel_deduped.shape# (3961, 12)# We are down from 4898 to 3961 dropping 937 rows!
# Use the randomizer in sample() to shuffle the entire datasetrandom_wines_shuffle=wine_dataset_excel.sample(frac=1)
random_wines_shuffle
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
really sour
total acidity
2203
7.8
0.240
0.38
2.1
0.058
14.0
167.0
0.99400
3.21
0.55
9.9
5
False
8.040
4294
7.4
0.220
0.28
9.0
0.046
22.0
121.0
0.99468
3.10
0.55
10.8
5
False
7.620
1363
6.9
0.320
0.16
1.4
0.051
15.0
96.0
0.99400
3.22
0.38
9.5
4
False
7.220
3310
6.3
0.300
0.29
2.1
0.048
33.0
142.0
0.98956
3.22
0.46
12.9
7
False
6.600
97
8.6
0.265
0.36
1.2
0.034
15.0
80.0
0.99130
2.95
0.36
11.4
7
True
8.865
...
2294
7.0
0.320
0.31
6.4
0.031
38.0
115.0
0.99235
3.38
0.58
12.2
7
False
7.320
405
6.8
0.270
0.12
1.3
0.040
87.0
168.0
0.99200
3.18
0.41
10.0
5
False
7.070
719
7.4
0.290
0.50
1.8
0.042
35.0
127.0
0.99370
3.45
0.50
10.2
7
False
7.690
2785
6.4
0.240
0.25
20.2
0.083
35.0
157.0
0.99976
3.17
0.50
9.1
5
False
6.640
1666
7.8
0.445
0.56
1.0
0.040
8.0
84.0
0.99380
3.25
0.43
10.8
5
False
8.245
Conditional Filtering
# mark all wines with a pH below 3 as sourwine_dataset_excel_indexed['really sour'] =wine_dataset_excel_indexed['pH'] <3.wine_dataset_excel_indexed.head(5)
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
really sour
1
7.0
0.27
0.36
20.7
0.045
45.0
170.0
1.0010
3.00
0.45
8.8
6
False
2
6.3
0.30
0.34
1.6
0.049
14.0
132.0
0.9940
3.30
0.49
9.5
6
False
3
8.1
0.28
0.40
6.9
0.050
30.0
97.0
0.9951
3.26
0.44
10.1
6
False
4
7.2
0.23
0.32
8.5
0.058
47.0
186.0
0.9956
3.19
0.40
9.9
6
False
5
7.2
0.23
0.32
8.5
0.058
47.0
186.0
0.9956
3.19
0.40
9.9
6
False
Filter by Value
# select only rows with a pH value below 3sour_wines=wine_dataset_excel_indexed[wine_dataset_excel_indexed['pH'] <3.]
print(sour_wines.shape)
# there are 437 sour wines - (437, 13)sour_wines.head(5)
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
really sour
11
8.1
0.270
0.41
1.45
0.033
11.0
63.0
0.9908
2.99
0.56
12.0
5
True
15
8.3
0.420
0.62
19.25
0.040
41.0
172.0
1.0002
2.98
0.67
9.7
5
True
74
8.6
0.230
0.46
1.00
0.054
9.0
72.0
0.9941
2.95
0.49
9.1
6
True
79
7.4
0.180
0.30
8.80
0.064
26.0
103.0
0.9961
2.94
0.56
9.3
5
True
98
8.6
0.265
0.36
1.20
0.034
15.0
80.0
0.9913
2.95
0.36
11.4
7
True
# select only rows that are marked as not sournot_so_sour_wines=wine_dataset_excel_indexed[wine_dataset_excel_indexed['really sour'] ==False]
print(not_so_sour_wines.shape)
# there are 4461 not so sour wines - (4461, 13)not_so_sour_wines.head(5)
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
really sour
1
7.0
0.27
0.36
20.7
0.045
45.0
170.0
1.0010
3.00
0.45
8.8
6
False
2
6.3
0.30
0.34
1.6
0.049
14.0
132.0
0.9940
3.30
0.49
9.5
6
False
3
8.1
0.28
0.40
6.9
0.050
30.0
97.0
0.9951
3.26
0.44
10.1
6
False
4
7.2
0.23
0.32
8.5
0.058
47.0
186.0
0.9956
3.19
0.40
9.9
6
False
5
7.2
0.23
0.32
8.5
0.058
47.0
186.0
0.9956
3.19
0.40
9.9
6
False
Filter by Multiple Values
# select only rows that are marked as not sour AND have a high sugar levelsweet_wines=wine_dataset_excel_indexed[
(
wine_dataset_excel_indexed['really sour'] ==False
) & (
wine_dataset_excel_indexed['residual sugar'] >20
)
]
print(sweet_wines.shape)
# there are 15 sweet wines - (15, 13)sweet_wines.head(5)
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
really sour
1
7.0
0.27
0.36
20.7
0.045
45.0
170.0
1.0010
3.00
0.45
8.8
6
False
8
7.0
0.27
0.36
20.7
0.045
45.0
170.0
1.0010
3.00
0.45
8.8
6
False
445
6.9
0.24
0.36
20.8
0.031
40.0
139.0
0.9975
3.20
0.33
11.0
6
False
1654
7.9
0.33
0.28
31.6
0.053
35.0
176.0
1.0103
3.15
0.38
8.8
6
False
1664
7.9
0.33
0.28
31.6
0.053
35.0
176.0
1.0103
3.15
0.38
8.8
6
False
# select only rows that have extreme total SO2 values OR sugar levelsselected_wines=wine_dataset_excel_indexed[
(
wine_dataset_excel_indexed['total sulfur dioxide'] >250.
) | (
wine_dataset_excel_indexed['residual sugar'] >30.
)
]
print(selected_wines.shape)
# (27, 13)selected_wines.head(5)
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
really sour
112
7.2
0.27
0.46
18.75
0.052
45.0
255.0
1.0000
3.04
0.52
8.9
5
False
228
7.1
0.25
0.32
10.30
0.041
66.0
272.0
0.9969
3.17
0.52
9.1
6
False
326
7.5
0.27
0.31
5.80
0.057
131.0
313.0
0.9946
3.18
0.59
10.5
5
False
388
6.3
0.39
0.35
5.90
0.040
82.5
260.0
0.9941
3.12
0.66
10.1
5
False
404
7.1
0.27
0.31
18.20
0.046
55.0
252.0
1.0000
3.07
0.56
8.7
5
False
Comparing Columns with Crosstab
wine_dataset_excel['really sour'] =wine_dataset_excel['pH'] <3.# how many wines that are 'really sour' are in each quality classpd.crosstab(wine_dataset_excel['quality'], wine_dataset_excel['really sour'])
# select wines with a density of 1.0010 OR 0.9956 inside the datasetoptions= [1.0010, 0.9956]
selected_wines_with_selected_density=wine_dataset_excel_indexed[
wine_dataset_excel_indexed['density'].isin(options)
]
print(selected_wines_with_selected_density.shape)
# (46, 13)selected_wines_with_selected_density.head(5)
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
really sour
1
7.0
0.27
0.36
20.7
0.045
45.0
170.0
1.0010
3.00
0.45
8.8
6
False
4
7.2
0.23
0.32
8.5
0.058
47.0
186.0
0.9956
3.19
0.40
9.9
6
False
5
7.2
0.23
0.32
8.5
0.058
47.0
186.0
0.9956
3.19
0.40
9.9
6
False
8
7.0
0.27
0.36
20.7
0.045
45.0
170.0
1.0010
3.00
0.45
8.8
6
False
71
6.2
0.27
0.43
7.8
0.056
48.0
244.0
0.9956
3.10
0.51
9.0
6
False
# select wines that DO NOT have a density of 1.0010 OR 0.9956 inside the datasetoptions= [1.0010, 0.9956]
selected_wines_with_selected_density=wine_dataset_excel_indexed[
~wine_dataset_excel_indexed['density'].isin(options)
]
print(selected_wines_with_selected_density.shape)
# (4852, 13)selected_wines_with_selected_density.head(5)
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
really sour
2
6.3
0.30
0.34
1.6
0.049
14.0
132.0
0.9940
3.30
0.49
9.5
6
False
3
8.1
0.28
0.40
6.9
0.050
30.0
97.0
0.9951
3.26
0.44
10.1
6
False
6
8.1
0.28
0.40
6.9
0.050
30.0
97.0
0.9951
3.26
0.44
10.1
6
False
7
6.2
0.32
0.16
7.0
0.045
30.0
136.0
0.9949
3.18
0.47
9.6
6
False
9
6.3
0.30
0.34
1.6
0.049
14.0
132.0
0.9940
3.30
0.49
9.5
6
False
Filter with between()
# select wines that have a density inbetween 0.9949 OR 0.9951 inside the datasetselected_wines_with_selected_density=wine_dataset_excel_indexed[
wine_dataset_excel_indexed['density'].between(0.9949, 0.9951, inclusive='both')
]
print(selected_wines_with_selected_density.shape)
# (105, 13)selected_wines_with_selected_density.head(5)
index
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
quality
really sour
3
8.1
0.28
0.40
6.9
0.050
30.0
97.0
0.9951
3.26
0.44
10.1
6
False
6
8.1
0.28
0.40
6.9
0.050
30.0
97.0
0.9951
3.26
0.44
10.1
6
False
7
6.2
0.32
0.16
7.0
0.045
30.0
136.0
0.9949
3.18
0.47
9.6
6
False
25
6.6
0.27
0.41
1.3
0.052
16.0
142.0
0.9951
3.42
0.47
10.0
6
False
50
6.9
0.19
0.35
5.0
0.067
32.0
150.0
0.9950
3.36
0.48
9.8
5
False
Apply
Apply a Custom Method to a Single Column (Pandas Series)
# custom function to round up a numberdefround_it_up(num):
returnnp.around(num, decimals=3)
# drop all rows that have less than 7 non-null valuesdf_missing.dropna(axis='rows', thresh=7)
# drop only rows that miss ALL data pointsdf_missing.dropna(how='all')
# only drop rows that have missing values in specified volumnsdf_missing.dropna(subset=['pH', 'alcohol'])
Fill Missing Data
# fill all missing values with `0`df_missing.fillna(value=0)
# fill all missing values with `0` in a specified rowdf_missing['pH'] =df_missing['pH'].fillna(value=0)
# use different fills per columnvalues= {
"fixed acidity": 0,
"volatile acidity": 0,
"citric acid": 0,
"residual sugar": 0,
"chlorides": 0,
"free sulfur dioxide": 0,
"total sulfur dioxide": 0,
"density": 0,
"pH": 0,
"sulphates": 0,
"alcohol": 0,
"quality": 'Not Evaluated',
}
df_missing.fillna(value=values)
# only drop rows that have missing values in specified volumnsdf_missing.dropna(subset=['pH', 'alcohol'])
# fill missing data with average valuesdf_missing['chlorides'] =df_missing['chlorides'].fillna(df_missing['chlorides'].mean())
GroupBy
# group all wines with the same quality# and display the mean valueswine_dataset.groupby('quality').mean()
quality
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol
3
7.600000
0.333250
0.336000
6.392500
0.054300
53.325000
170.600000
0.994884
3.187500
0.474500
10.345000
4
7.129448
0.381227
0.304233
4.628221
0.050098
23.358896
125.279141
0.994277
3.182883
0.476135
10.152454
5
6.933974
0.302011
0.337653
7.334969
0.051546
36.432052
150.904598
0.995263
3.168833
0.482203
9.808840
6
6.837671
0.260564
0.338025
6.441606
0.045217
35.650591
137.047316
0.993961
3.188599
0.491106
10.575372
7
6.734716
0.262767
0.325625
5.186477
0.038191
34.125568
125.114773
0.992452
3.213898
0.503102
11.367936
8
6.657143
0.277400
0.326514
5.671429
0.038314
36.720000
126.165714
0.992236
3.218686
0.486229
11.636000
9
7.420000
0.298000
0.386000
4.120000
0.027400
33.400000
116.000000
0.991460
3.308000
0.466000
12.180000
# only return the mean values of one colums as a pd serieswine_dataset.groupby('quality').mean()['total sulfur dioxide']
quality
3
170.600000
4
125.279141
5
150.904598
6
137.047316
7
125.114773
8
126.165714
9
116.000000
Name: total sulfur dioxide, dtype: float64
GroupBy Multi-Index
# group by multiple featuresquality_tsd=wine_dataset.groupby(['quality','total sulfur dioxide']).mean()
quality_tsd
# only show averages for wines of aquality of 3 and tsd average is 111.0quality_tsd.loc[(3, 111.)]
fixed acidity
6.2000
volatile acidity
0.2300
citric acid
0.3500
residual sugar
0.7000
chlorides
0.0510
free sulfur dioxide
24.0000
density
0.9916
pH
3.3700
sulphates
0.4300
alcohol
11.0000
Name: (3, 111.0), dtype: float64
Multi-Index Cross-Section
# cross-section - get all wines with quality 9quality_tsd.xs(key=9, level='quality')
total sulfur dioxide
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
density
pH
sulphates
alcohol
85.0
6.6
0.36
0.29
1.6
0.021
24.0
0.98965
3.41
0.61
12.4
113.0
7.1
0.26
0.49
2.2
0.032
31.0
0.99030
3.37
0.42
12.9
119.0
6.9
0.36
0.34
4.2
0.018
57.0
0.98980
3.28
0.36
12.7
124.0
9.1
0.27
0.45
10.6
0.035
28.0
0.99700
3.20
0.46
10.4
139.0
7.4
0.24
0.36
2.0
0.031
27.0
0.99055
3.28
0.48
12.5
# cross-section - get all wines with tsd = 124quality_tsd.xs(key=124., level='total sulfur dioxide')
quality
fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
density
pH
sulphates
alcohol
4
8.450000
0.330000
0.550000
3.700000
0.043500
12.500000
0.993890
3.045000
0.440000
10.550000
5
6.661538
0.274615
0.309231
9.638462
0.044769
23.538462
0.995772
3.161538
0.496923
10.015385
6
6.826087
0.284783
0.327826
5.917391
0.043826
28.565217
0.993708
3.163478
0.446522
10.534783
7
6.616667
0.285000
0.295833
5.333333
0.037500
36.750000
0.991735
3.165833
0.517500
11.841667
8
6.900000
0.290000
0.365000
4.800000
0.036000
34.500000
0.991015
3.135000
0.375000
12.300000
9
9.100000
0.270000
0.450000
10.600000
0.035000
28.000000
0.997000
3.200000
0.460000
10.400000
# pre-select only wines with a pH value between 3.0-3.1# and then group by quality and tsdwine_dataset[wine_dataset['pH'].isin([3.0, 3.1])].groupby(['quality','total sulfur dioxide']).mean()
# working with US datetime objectdate_series=pd.Series([
'Aug 27, 1989',
'2021-04-01',
'2020-10-25 02:00 +0800',
'19th of Mar 2003'
])
# generate datetime objectsdate_series=pd.to_datetime(date_series)
date_series
0
1989-08-27 00:00:00
1
2021-04-01 00:00:00
2
2020-10-25 02:00:00+08:00
3
2003-03-19 00:00:00
dtype: datetime64[ns]
date_series[1].month# 4
# working with European datetime objectdate_series_european=pd.Series(['31-03-2005', '01-04-2005', '28-04-2005'])
date_series_european=pd.to_datetime(date_series_european, dayfirst=True)
date_series_european
# working with time indicessales_time_index=sales_autoparse.set_index('DATE')
# group data by years and calculate the mean valuesales_time_index.resample('A').mean()
# get all worksheetsdf_excel_dictionary=pd.read_excel('datasets/wine-quality.xlsx', sheet_name=None)
# show all sheet keysprint(df_excel_dictionary.keys())
# dict_keys(['Sheet1'])# select sheet by keydf_excel_dictionary['Sheet1']
# write to exceldf_excel.to_excel('datasets/test.xlsx', index=False)