Rain Prediction in Australia

About the Data Set

The data set used for this project originates from the Australian Government's Bureau of Meteorology and can be accessed here. The dataset used in this project includes additional columns such as 'RainToday,' and the target variable is 'RainTomorrow,' which was obtained from Rattle.

Import the Required Libraries

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import svm
from sklearn.metrics import jaccard_score
from sklearn.metrics import f1_score
from sklearn.metrics import log_loss
from sklearn.metrics import confusion_matrix, accuracy_score
import sklearn.metrics as metrics

Importing the Dataset

df = pd.read_csv("Weather_Data.csv")
df.head()

Data Preprocessing

One-Hot Encoding

First, we perform one-hot encoding to convert categorical variables to binary variables.

df_sydney_processed = pd.get_dummies(data=df, columns=['RainToday', 'WindGustDir', 'WindDir9am', 'WindDir3pm'])

Next, we replace the values in the 'RainTomorrow' column, changing them from categorical to binary (0 for 'No' and 1 for 'Yes').

df_sydney_processed.replace(['No', 'Yes'], [0, 1], inplace=True)

Training Data and Test Data

We split the data into training and testing sets.

df_sydney_processed.drop('Date', axis=1, inplace=True)
df_sydney_processed = df_sydney_processed.astype(float)

features = df_sydney_processed.drop(columns='RainTomorrow', axis=1)
Y = df_sydney_processed['RainTomorrow']

Linear Regression

We use the Linear Regression model and evaluate its performance.

x_train, x_test, Y_train, Y_test = train_test_split(features, Y, test_size=.2, random_state=10)

LinearReg = LinearRegression()
LinearReg.fit(x_train, Y_train)

predictions = LinearReg.predict(x_test)

K-Nearest Neighbors (KNN)

We use the K-Nearest Neighbors classifier and assess its performance.

k = 4
KNN = KNeighborsClassifier(n_neighbors=k).fit(x_train, Y_train)
predictions2 = KNN.predict(x_test)

Decision Tree Classifier

We employ the Decision Tree classifier and evaluate its performance.

Tree = DecisionTreeClassifier()
Tree = Tree.fit(x_train, Y_train)
predictions3 = Tree.predict(x_test)

Logistic Regression

Lastly, we use Logistic Regression and assess its performance.

x_train2, x_test2, Y_train2, Y_test2 = train_test_split(features, Y, test_size=.2, random_state=1)

LR = LogisticRegression(C=1.0, solver='liblinear').fit(x_train2, Y_train2)
predictions4 = LR.predict(x_test2)

Model Evaluation Metrics

We calculate various evaluation metrics for each model:

Linear Regression:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- R-squared (R2)
K-Nearest Neighbors:
- Accuracy
- Jaccard Index
- F1 Score
Decision Tree Classifier:
- Accuracy
- Jaccard Index
- F1 Score
Logistic Regression:
- Coefficients
- Predictions
- Accuracy
- Jaccard Index
- F1 Score

Summary of Results

Here are the results for each model:

Linear Regression

MAE: 0.2563
MSE: 0.1157
R2: 0.3402

K-Nearest Neighbors (KNN)

Accuracy: 0.8183
Jaccard Index: 0.7901
F1 Score: 0.5966

Decision Tree Classifier

Accuracy: 0.7588
Jaccard Index: 0.7122
F1 Score: 0.5730

Logistic Regression

Coefficients: [coefficients_list]
Accuracy: [accuracy_score]
Jaccard Index: [jaccard_index]
F1 Score: [f1_score]

You can use these results to assess the performance of each model for your specific problem.

1AyaNabil1/Rain-Prediction-in-Australia

Rain Prediction in Australia

About the Data Set

Import the Required Libraries

Importing the Dataset

Data Preprocessing

One-Hot Encoding

Training Data and Test Data

Linear Regression

K-Nearest Neighbors (KNN)

Decision Tree Classifier

Logistic Regression

Model Evaluation Metrics

Summary of Results

Linear Regression

K-Nearest Neighbors (KNN)

Decision Tree Classifier

Logistic Regression