data-splitting
There are 22 repositories under data-splitting topic.
shenwanxiang/ChemBench
MoleculeNet benchmark dataset & MolMapNet dataset
sharejing/Takin
A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。
omardbaa/Data-Splitter
Data-Splitter is a Python script designed to split a large CSV file containing data into three different formats: JSON, a database table, and another CSV file. The script ensures a random distribution of data across the three output formats based on custom-defined ratios.
aarryasutar/Credit_EDA
This project focuses on cleaning and analyzing a loan application dataset to gain insights into the factors influencing loan defaults. Through systematic data cleaning, visualization, and merging with previous application data, it provides a robust foundation for further predictive modeling.
Aravinda89/split_train_eval_test
splitting image dataset into train, val, test sets
Dimas263/Preprocessing-Data-into-Train-Test-Val-Data
Python Preprocessing for Sales Project Notebook
katiebristol/data_splitter
A basic Python script to split a .dat file into individual sample files.
MadhuBala11/DiabetesPrediction
In this project, I have used logistic regression, a supervised machine learning algorithm, to predict whether a person has diabetes or not based on various features such as age, blood pressure, glucose level, body mass index, etc. I have used Python and popular libraries such as Pandas, Scikit-Learn, and Matplotlib to perfom model building
Nicolas-Bolouri/Cloud-Data-Protection-Analysis
Comparative Analysis of Data Protection Mechanisms in Public Clouds
SahilSunda/ML_Project
ML model for Crop Detection
szcf-weiya/SplitClusterTest.jl
Julia package for "FDR Control via Data Splitting for Testing-after-Clustering (arXiv: 2410.06451)"
intersog-developer/nn-data-splitting-example
Source code for the article "Accelerating neural network training by data splitting".
JVTupinamba/Kennard-Stone-Mahalanobis
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
UERJ-LIVIA/Kennard-Stone-Mahalanobis
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
yuerongz/DUPLEX-data-split-function
Apply DUPLEX data split to the given dataset and return training and test datasets. REF: Snee, R. D. (1977). Validation of regression models: methods and examples. Technometrics, 19(4), 415-428.
jesschannn/datasci_9_data_prep
Focus on selecting datasets suitable for a machine learning experiment, with an emphasis on data cleaning, encoding, and transformation steps necessary to prepare the data.
krisssix/BankruptcyPrediction
Predicting company bankruptcy using various machine learning models. The dataset is sourced from Kaggle: Company Bankruptcy Prediction.
Lefteris-Souflas/Propensity-To-Lapse-Model-Building-Exercise
Analyzed customer churn using transaction data. Built ML model to predict lapses. Dataset includes customer status, collection/redemption info, and program tenure. Delivered business presentation outlining modeling approach, findings, and churn reduction strategies.
Lefteris-Souflas/Spark-Movies-Analytics
Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.
MuhireIghor/health-pro-backend
A sample model for predicting the systolic level of an individual by providing the age,cholesterol and blood pressure
NabilahSharfina/Ruangguru-Bootcamp
Final project program DBA mitra Ruangguru X Studi Independen Bersertifikat Kampus Merdeka batch 2