Optimization-Methods-in-Intelligent-Systems

In this repository, I intend to focus on implementing various optimization methods. The first section employs binary logistic regression for classifying MNIST dataset. The second section deals with optimizing a non-convex cost function, using the genetic algorithm for this purpose. Lastly, in the third section, we acquaint ourselves with SVM.

Part 1: Binary Logistic Regression

Consider the MNIST dataset, where we aim to have a binary classifier; such that it determines whether a digit is the number 2 or 7. To achieve this, we assume that for the digits 7 in the dataset, 𝑌 = 1, and for the digits 2, 𝑌 = -1.

The following images depict the cost function values versus the number of iterations for both the training and testing datasets using all three steps.

Results: As can be seen in the above image, the machine is better learned on the training data, and the value of the cost function is slightly more descending (lower). The reason for this is the larger number of training data. It is also observed that as the value of the Learning Rate approaches 1, the value of the cost function decreases more quickly.

Stochastic Gradient Descent: In this section, we perform the same steps as parts B and C with a difference in the algorithm. To do this, in each iteration, we select a randomly sized batch of data from the entire training dataset and use it to update the parameters b and ω. We carry out this process for batch sizes of 1 and 100. Subsequently, we obtain the accuracies for the newly designed machines in part C.

Packet size = 1

Packet size = 100

Results: As observed in the images, for larger batch sizes during each iteration, the machine learns better from the training data, and the value of the cost function is slightly more decreasing (lower).

Part 2: Optimization in Non-Convex Functions

$f(x_1,x_2 )=2x_1^2+2x_2^2-17x_2 cos⁡(0.2πx_1 )-x_1 x_2$

Newton Method

Close: If the distance is less than 5.
Far: If the distance is between 5 and 50.
Farther: If the distance is greater than 50.

Metaheuristic Approach - Genetic Algorithm

In this section, we consider a population of N arbitrary points. In the selection phase, we choose the best specimen (with the minimum value of the objective function) and introduce it into the new population. We select N-1 other samples as follows: from two randomly chosen points out of the previous N points, we pick the one with the lower value of the objective function and introduce it into the new population. In the crossover stage, we represent the selected numbers as an 8-bit binary number.

For this purpose, considering the given explanations, we treat each of x_2 and x_1 as an array ranging from -15 to +15 with 255 elements. We then shuffle the order of the population. Considering two adjacent members as a pair, we choose a random number from 1 to 7 and pair the two pairs by taking that number of bits from the left side. We repeat this process to create another child, and similarly, we generate a new population with the same number N.

In the mutation stage, with a probability of mutation_rate, we select a number of mutation_select from the population. Then, we randomly choose mutation_window bits from the selected group and flip them (if 0, make it 1; if 1, make it 0). Finally, we convert the numbers back from binary representation and return to the first stage. We repeat all these stages ITR times.

ITR = 100
Pop_size = 1000
mutation_rate = 0.1
mutation_window = 1
mutation_select = 5

Part 3: Support Vector Machine

In this section, we intend to perform classification on the iris dataset using Support Vector Machine. First, load the dataset using the code snippet below.

from sklearn import datasets
iris = datasets.load_iris()
data = iris.data[:, :2] #data
label = iris.target #label

Results:

ErfanPanahi/Optimization-Methods-in-Intelligent-Systems

Optimization-Methods-in-Intelligent-Systems