To work as a Data Scientist for the Autolib electric car-sharing service company to investigate a claim about the blue cars from the Autolib dataset on either weekday or weekend.
H0:Bluecars are mostly taken on weekdays.
H1: Bluecars are not mostly taken on weekdays.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from scipy import stats from statsmodels.stats import weightstats from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import confusion_matrix,accuracy_score from sklearn.naive_bayes import GaussianNB
it has 'Postal code', 'date', 'n_daily_data_points', 'dayOfWeek', 'day_type','BlueCars_taken_sum', 'BlueCars_returned_sum', 'Utilib_taken_sum', 'Utilib_returned_sum', 'Utilib_14_taken_sum', 'Utilib_14_returned_sum', 'Slots_freed_sum', 'Slots_taken_sum' columns
It has 4645 rows
Checking for outliers
Data Visualization Checking for varibles that correlate uasing scatter plot and heatmap
checking for the frequency of variables using bar graph pie chart and histogram
simple random sampling
stratified random sampling
Tested normality using qqlot and histograms
From the above analysis we conclude that usage of bluecars is more on weekdays compared to weekend