This repository features implementations of three essential machine learning algorithms: Naive Bayes, Linear Regression, and Logistic Regression. The code is structured into distinct classes for each algorithm, with a primary script showcasing their application for training and evaluation on various datasets.
To get started with this project, follow these steps:
-
Clone the repository and navigate to the project directory:
git clone <repository_url> cd <repository_directory>
-
Install the required Python libraries:
pip install numpy pandas
Make sure to replace
<repository_url>
with the URL of your repository and<repository_directory>
with the name of the directory where the repository is cloned.
-
Prepare Your Data: Ensure your data files are in CSV format and located in the same directory as the script.
-
Execute the Main Script: Run the script to train and evaluate the models:
python main.py
-
Customization: Feel free to adjust the
train_test_split
function or model parameters in the script according to your needs.
This class implements a basic Naive Bayes classifier. It estimates probabilities based on the frequency of feature values within each class.
fit(X, y)
: Trains the model with featuresX
and labelsy
.predict(prediction)
: Provides class predictions for new data.score(X_test, y_test)
: Assesses model performance on test data.
This class represents a straightforward linear regression model, which fits a linear relationship to the data to predict continuous values.
fit(X_train, y_train, iterations=1000, alpha=0.0001)
: Trains the model using gradient descent.predict(X)
: Predicts target values based on input featuresX
.mean_squared_error(y_true, y_pred)
: Computes the mean squared error between actual and predicted values.
This class applies logistic regression for binary classification tasks.
fit(X, y, iterations=1000, alpha=0.0001)
: Trains the model using gradient descent.predict(X)
: Predicts class labels for the provided featuresX
.score(X_test, y_test)
: Evaluates the model on the test set, calculating errors, accuracy, precision, and recall.
-
Naive Bayes:
- Calculates conditional probabilities for each feature given the class.
- Classifies based on the highest posterior probability.
-
Linear Regression:
- Minimizes the mean squared error between predicted and actual values using gradient descent.
-
Logistic Regression:
- Applies the sigmoid function to convert predictions into probabilities.
- Uses gradient descent to minimize cross-entropy loss.
- Accuracy: Measures the proportion of correct predictions.
- Precision: The ratio of true positive predictions to all positive predictions.
- Recall: The ratio of true positives to all actual positives.
- F-measure: The harmonic mean of precision and recall.
- Mean Squared Error (MSE): The average squared difference between actual and predicted values.
- Data Format: Ensure datasets are correctly formatted and free from missing values.
- Customization: Adjust hyperparameters such as the number of iterations and learning rate as needed.
- Error Handling: Consider adding error handling for different data types or missing values.
We welcome contributions and collaborations to enhance this project!