/airfoil-self-noise-prediction

Machine learning course SCS 3253

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Airfoil Self-noise prediction

Website

Jypter Notebook

Submitted by :

  • Amarendra Sabat
  • Claudinei Daitx
  • Suwei (Stream) Qi

Introduction

  • The NASA data set comprises different size NACA 0012 airfoils at various wind tunnel speeds and angles of attack.

  • The span of the airfoil and the observer position were the same in all of the experiments.

  • The NASA data set was obtained from a series of aerodynamic and acoustic tests of two and three-dimensional airfoil blade sections conducted in an anechoic wind tunnel.

  • Relevant Papers:

    • T.F. Brooks, D.S. Pope, and A.M. Marcolini. Airfoil self-noise and prediction. Technical report, NASA RP-1218, July 1989.

    • K. Lau. A neural networks approach for aerofoil noise prediction. Master’s thesis, Department of Aeronautics. Imperial College of Science, Technology and Medicine (London, United Kingdom), 2006.

    • R. Lopez. Neural Networks for Variational Problems in Engineering. PhD Thesis, Technical University of Catalonia, 2008.

  • Citation :

  • Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Link: Airfoil Self-Noise Data Set

1. Required Python libraries

import os
import warnings
import pandas as pd
import seaborn as sns
import numpy as np

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
from IPython.core.display import HTML, display

# Machine learn packages
import tensorflow as tf
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, BayesianRidge
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler, QuantileTransformer, MaxAbsScaler
from sklearn.decomposition import PCA
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from mpl_toolkits.mplot3d import Axes3D
from tensorflow import keras

# Remove all warnings in this notebook
warnings.filterwarnings('ignore')
tf.logging.set_verbosity(tf.logging.ERROR)

# Same random seed state
np.random.seed(42)
random_state=42

2. Load airfloil dataset

2.1 Attribute Information

  1. Frequency (Hertzs)
  2. Angle of attack (degrees)
  3. Chord length (meters)
  4. Free-stream velocity (meters per second)
  5. Suction side displacement thickness (meters)

The only output is: 6. Scaled sound pressure level (decibels)

airfoil_dataset.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
frequency angle_of_attack chord_length free_stream_velocity suction_side_displacement_thickness scaled_sound_pressure_level
0 800 0.0 0.3048 71.3 0.002663 126.201
1 1000 0.0 0.3048 71.3 0.002663 125.201
2 1250 0.0 0.3048 71.3 0.002663 125.951
3 1600 0.0 0.3048 71.3 0.002663 127.591
4 2000 0.0 0.3048 71.3 0.002663 127.461
airfoil_dataset.tail()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
frequency angle_of_attack chord_length free_stream_velocity suction_side_displacement_thickness scaled_sound_pressure_level
1498 2500 15.6 0.1016 39.6 0.052849 110.264
1499 3150 15.6 0.1016 39.6 0.052849 109.254
1500 4000 15.6 0.1016 39.6 0.052849 106.604
1501 5000 15.6 0.1016 39.6 0.052849 106.224
1502 6300 15.6 0.1016 39.6 0.052849 104.204
airfoil_dataset.describe()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
frequency angle_of_attack chord_length free_stream_velocity suction_side_displacement_thickness scaled_sound_pressure_level
count 1503.000000 1503.000000 1503.000000 1503.000000 1503.000000 1503.000000
mean 2886.380572 6.782302 0.136548 50.860745 0.011140 124.835943
std 3152.573137 5.918128 0.093541 15.572784 0.013150 6.898657
min 200.000000 0.000000 0.025400 31.700000 0.000401 103.380000
25% 800.000000 2.000000 0.050800 39.600000 0.002535 120.191000
50% 1600.000000 5.400000 0.101600 39.600000 0.004957 125.721000
75% 4000.000000 9.900000 0.228600 71.300000 0.015576 129.995500
max 20000.000000 22.200000 0.304800 71.300000 0.058411 140.987000
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1503 entries, 0 to 1502
Data columns (total 6 columns):
frequency                              1503 non-null int64
angle_of_attack                        1503 non-null float64
chord_length                           1503 non-null float64
free_stream_velocity                   1503 non-null float64
suction_side_displacement_thickness    1503 non-null float64
scaled_sound_pressure_level            1503 non-null float64
dtypes: float64(5), int64(1)
memory usage: 70.5 KB

2.2 Insights

png

  • There are only four free stream velocities.

    • 31.7 m/s
    • 39.6 m/s
    • 55.5 m/s
    • 71.3 m/s
  • There are six chord lengths:

    • 2.5 cm
    • 5 cm
    • 10 cm
    • 15 cm
    • 22 cm
    • 30 cm

3. Examine dimensions in a graphic

You can use this concept to reduce the number of features in your dataset without having to lose much information and keep or improve the model’s performance. In this case, you can see two different dimensionality reductions PCA and T-SNE and, it shows that this dataset is non-linear.

3.1 Split features and targets

In this section, I will split the original dataset in two new datasets:

  1. Features: all columns except the target column
  2. Target: only the target column

3.2 Visualization using PCA

Using PCA is possible to see this dataset is non-linear and I can not get good results if I use a linear model such as Linear Regression.

png

3.3 Visualization using t-SNE

Using T-SNE you can see in this dataset all the clusters and detect that you need a non-linear model to get the best results for the prediction.

png

Choosing my model

4.1 Feature Engine

For this scenario I'm using three feature engines:

  • Quantile transformer: This method transforms the features to follow a uniform or a normal distribution. This method is good when you work with non-linear datasets
  • Max Abs scaler: Scale each feature by its maximum absolute value.
  • Standard scaler: Standardize features by removing the mean and scaling to unit variance.
train_set = QuantileTransformer(random_state=0).fit_transform(train_set)
train_set = MaxAbsScaler().fit_transform(train_set)
train_set = StandardScaler().fit_transform(train_set)

Machine learning models

I have the class ModelEstimator to train and predict all models. In this case, I am using linear and non-linear models to see the difference between them.

Machine learning models:

  • Lasso
  • Linear regression
  • Neural network
  • SVR
  • Bayesian Ridge
  • Ridge
  • ElasticNet

Now I am choosing a few models and parameters to test and get the best result for each model. In this case I do not need to choose a specific parameter because the ModelEstimator class is using GridSearchCV to decide the best estimator for each model.

4.3 Neural network

Another approach is to use neural networks to find best results. In this case, I am using five hidden layers and Ridge (L2) for the regularization.

Results

  • Using linear regressions such as LinearRegression the best model has an accuracy of nearby 45%.

  • Using non-linear regressions such as SVR the best model has an accuracy of nearby 79%. It is 34% more than linear models.

  • Using a neural network the accuracy is around 88%. It is around 10% more than SVR.

For this dataset using a neural network, you can get the best result but, it does not mean the SVR not fit well. It means for this dataset and this amount of data (1503 rows) a neural network fits better than an SVR model.

ModelR2 scoreMSE score
LinearRegression0.456806211066344226.531048242586873
Ridge0.4568691251730737626.52797534809731
Lasso0.4555153289981639426.594098400972552
ElasticNet0.4555153289981639426.594098400972552
BayesianRidge0.456866838195352826.52808705025128
SVR0.79813769260957919.859498994362172
Neural network0.88925808865402135.4089333351256155