This function aims to simulate the Birthday Paradox problem, where the goal is to find the minimum number of people needed in a group such that there is a 50% chance that at least three people share the same birthday.
import random
from collections import Counter
def three_shared_BD(n, num_trials):
# ... (function code)
return success_count / num_trials
- n: Number of people in the group.
- num_trials: Number of simulation trials.
num_trials = 10000
target_probability = 0.5
num_simulations = 20
for i in range(num_simulations):
n = 1
while True:
probability = three_shared_BD(n, num_trials)
if probability >= target_probability:
print(f"Simulation {i+1}: Minimum n for P(A) > {target_probability} is {n}")
break
n += 1
This function simulates a variant of the Birthday Paradox, looking for the minimum number of people needed in a group such that there is a 90% chance that at least two people share the same birthday.
import random
def two_shared_BD(n, num_trials):
# ... (function code)
return success_count / num_trials
- n: Number of people in the group.
- num_trials: Number of simulation trials.
num_trials = 30
target_probability = 0.9
num_simulations = 10
for i in range(num_simulations):
n = 1
while True:
probability = two_shared_BD(n, num_trials)
if probability >= target_probability:
print(f"Simulation {i+1}: Minimum n for P(A) > {target_probability} is {n}")
break
n += 1
This code generates random data from various distributions and plots them.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# ... (code for plot_distribution function)
# 1) Generate random data from a Uniform distribution
uniform_data = np.random.uniform(0, 1, 10000)
plot_distribution(uniform_data, "Uniform Distribution")
# ... (similar code for other distributions)
# Generate means of each distribution over multiple iterations
# ... (code for the iteration loop and plotting means)
This code estimates the value of pi using the Monte Carlo method.
import numpy as np
import matplotlib.pyplot as plt
import math
# ... (code for estimate_pi function)
# Various number of N for different iterations
N_values = [100, 500, 1000, 5000, 10000]
# ... (code for pi estimation and plotting)
This code simulates a gambler's experience, tracking the stake over rounds.
import numpy as np
import matplotlib.pyplot as plt
# ... (code for gamble and simulate_gambling functions)
# Example simulation
simulate_gambling(initial_stake=100, bet_amount=10, win_probability=0.5, target_amount=200, num_rounds=1000, num_simulations=5)
# ... (code for multiple iterations and statistics plotting)
This code simulates a scenario where two players roll dice, and player X gains points based on the outcomes.
import numpy as np
# ... (code for expected_gains_list simulation)
# Example simulation
simulation_number = 10000
for _ in range(simulation_number):
# ... (code for X, Y, and G calculations)
print(sum(expected_gains_list) / simulation_number)
This code calculates the expected gains in a game where two players roll dice. The simulation is repeated a specified number of times, and the average expected gain is then printed.
This part loads the data from an Excel file and prints the first few rows of the dataframe. Additionally, it generates a histogram for the 'population' column.
import pandas as pd
import matplotlib.pyplot as plt
file_path = 'Problem 7\Data.xls'
df = pd.read_excel(file_path)
print("First few rows of the dataframe:")
print(df.head())
plt.hist(df['population'], bins=20, color='skyblue', edgecolor='black')
plt.title('Histogram of Population Values for Cancer Mortality')
plt.xlabel('Population')
plt.ylabel('Frequency')
plt.show()
This part calculates and prints total cancer mortality, population mean, population variance, and population standard deviation.
import pandas as pd
file_path = 'Problem 7\Data.xls'
df = pd.read_excel(file_path)
population_mean = df['population'].mean()
total_mortality = df['mortalities'].sum()
population_variance = df['population'].var()
population_std_dev = df['population'].std()
print("Total Cancer Mortality:", total_mortality)
print("Population Mean:", population_mean)
print("Population Variance:", population_variance)
print("Population Standard Deviation:", population_std_dev)
This part generates a sampling distribution of the mean with a sample size of 25 and 10,000 samples.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(42)
file_path = 'Problem 7\Data.xls'
df = pd.read_excel(file_path)
sample_size = 25
num_samples = 10000
distribution_means = []
for _ in range(num_samples):
sample = np.random.choice(df['mortalities'], size=sample_size, replace=True)
sample_mean = np.mean(sample)
distribution_means.append(sample_mean)
plt.figure(figsize=(8, 6))
plt.hist(distribution_means, bins=100, color='skyblue', edgecolor='black')
plt.title('Sampling Distribution of the Mean (Sample Size = 25)')
plt.xlabel('Sample Mean')
plt.ylabel('Frequency')
plt.show()
This part calculates and prints 95% confidence intervals for the mean of the population.
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
np.random.seed(42)
file_path = 'Problem 7\Data.xls'
df = pd.read_excel(file_path)
sample_size = 100
sample = df.sample(sample_size)
mean_sample_population = np.mean(sample['population'])
# Confidence intervals calculations
# ... (code for confidence intervals calculation)
This part calculates and prints ratio estimates for the population mean and total cancer mortality.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(42)
file_path = 'Problem 7\Data.xls'
df = pd.read_excel(file_path)
sample_size = 25
sample = df.sample(sample_size)
mean_estimate = (sample['population'] * sample['mortalities']).sum() / sample['population'].sum()
total_mortality_estimate = (sample['population'] * sample['mortalities']).sum()
print("Ratio Estimate for Population Mean:", mean_estimate)
print("Ratio Estimate for Total Cancer Mortality:", total_mortality_estimate)
This part performs stratified sampling with four strata, calculates the sampling fractions using proportional and optimal allocation, and computes the variances for simple random sampling, proportional allocation, and optimal allocation.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(42)
file_path = 'Problem 7\Data.xls'
df = pd.read_excel(file_path)
num_strata = 4
# ... (code for stratified sampling and variance calculations)
This repository contains Python scripts for data analysis and statistical tests. Below is a summary of the scripts and their functionality.
File: question_3_1.py
This script uses Matplotlib to create a histogram showing the age distribution of male and female fans. It uses data arrays men_ages
and women_ages
to generate the plot.
File: question_3_2.py
This script performs a Kolmogorov-Smirnov test on the ages of men using SciPy's kstest
. Additionally, it creates a Q-Q plot using the probplot
function from SciPy.
File: question_4_2.py
Here, a probability matrix is flattened, and pseudo-random numbers are generated to classify observations based on the specified conditions. The counts in each cell and an example of the first few observations are displayed.
File: question_4_4.py
This script uses the generated data from Question 4_2 to perform a chi-squared goodness-of-fit test. It calculates observed and expected counts, then checks the hypothesis at a significance level of 0.05.
File: question_8_1.py
The script performs a Kolmogorov-Smirnov test on given data to determine if it follows a uniform distribution.
File: question_8_2.py
This script calculates the maximum difference between corresponding elements of two data arrays, data1
and data2
.
File: question_3_6.py
The Mann-Whitney U test is performed on the ages of men and women to compare the distributions.
-
Ensure you have Python installed on your system.
-
Clone this repository to your local machine.
-
Run the scripts using a Python interpreter:
python question_3_1.py python question_3_2.py python question_4_2.py python question_4_4.py python question_8_1.py python question_8_2.py python question_3_6.py