data-splitting

There are 22 repositories under data-splitting topic.

shenwanxiang/ChemBench
MoleculeNet benchmark dataset & MolMapNet dataset
Language:HTML61 3 618
sharejing/Takin
A Python toolkit for file processing, text cleaning and data splitting. 文件处理，文本清洗和数据划分的python工具包。
Language:Python29 2 06
omardbaa/Data-Splitter
Data-Splitter is a Python script designed to split a large CSV file containing data into three different formats: JSON, a database table, and another CSV file. The script ensures a random distribution of data across the three output formats based on custom-defined ratios.
Language:Jupyter Notebook12 1 01
aarryasutar/Credit_EDA
This project focuses on cleaning and analyzing a loan application dataset to gain insights into the factors influencing loan defaults. Through systematic data cleaning, visualization, and merging with previous application data, it provides a robust foundation for further predictive modeling.
Language:Jupyter Notebook1 1 0
Aravinda89/split_train_eval_test
splitting image dataset into train, val, test sets
Language:Python1 1 01
Dimas263/Preprocessing-Data-into-Train-Test-Val-Data
Python Preprocessing for Sales Project Notebook
Language:Jupyter Notebook1 1 01
katiebristol/data_splitter
A basic Python script to split a .dat file into individual sample files.
Language:Python1 1 00
MadhuBala11/DiabetesPrediction
In this project, I have used logistic regression, a supervised machine learning algorithm, to predict whether a person has diabetes or not based on various features such as age, blood pressure, glucose level, body mass index, etc. I have used Python and popular libraries such as Pandas, Scikit-Learn, and Matplotlib to perfom model building
Language:Jupyter Notebook1 1 0
Nicolas-Bolouri/Cloud-Data-Protection-Analysis
Comparative Analysis of Data Protection Mechanisms in Public Clouds
1 1 00
SahilSunda/ML_Project
ML model for Crop Detection
Language:Python1 0 01
szcf-weiya/SplitClusterTest.jl
Julia package for "FDR Control via Data Splitting for Testing-after-Clustering (arXiv: 2410.06451)"
Language:Julia10
intersog-developer/nn-data-splitting-example
Source code for the article "Accelerating neural network training by data splitting".
Language:Rust0 1 00
JVTupinamba/Kennard-Stone-Mahalanobis
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
Language:Jupyter Notebook0 1 00
UERJ-LIVIA/Kennard-Stone-Mahalanobis
As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.
Language:Jupyter Notebook0 0 00
yuerongz/DUPLEX-data-split-function
Apply DUPLEX data split to the given dataset and return training and test datasets. REF: Snee, R. D. (1977). Validation of regression models: methods and examples. Technometrics, 19(4), 415-428.
Language:Python0 0 00
jesschannn/datasci_9_data_prep
Focus on selecting datasets suitable for a machine learning experiment, with an emphasis on data cleaning, encoding, and transformation steps necessary to prepare the data.
Language:Python1 0
krisssix/BankruptcyPrediction
Predicting company bankruptcy using various machine learning models. The dataset is sourced from Kaggle: Company Bankruptcy Prediction.
Language:Python1 0
Lefteris-Souflas/Propensity-To-Lapse-Model-Building-Exercise
Analyzed customer churn using transaction data. Built ML model to predict lapses. Dataset includes customer status, collection/redemption info, and program tenure. Delivered business presentation outlining modeling approach, findings, and churn reduction strategies.
1 0
Lefteris-Souflas/Spark-Movies-Analytics
Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.
Language:Jupyter Notebook1 0
MuhireIghor/health-pro-backend
A sample model for predicting the systolic level of an individual by providing the age,cholesterol and blood pressure
Language:JavaScript1 0
NabilahSharfina/Ruangguru-Bootcamp
Final project program DBA mitra Ruangguru X Studi Independen Bersertifikat Kampus Merdeka batch 2
Language:Jupyter Notebook1 2
raghav-arora-1998/Tradtional-Regression-and-Model-Selection
Language:HTML1 01

data-splitting

shenwanxiang/ChemBench

sharejing/Takin

omardbaa/Data-Splitter

aarryasutar/Credit_EDA

Aravinda89/split_train_eval_test

Dimas263/Preprocessing-Data-into-Train-Test-Val-Data

katiebristol/data_splitter

MadhuBala11/DiabetesPrediction

Nicolas-Bolouri/Cloud-Data-Protection-Analysis

SahilSunda/ML_Project

szcf-weiya/SplitClusterTest.jl

intersog-developer/nn-data-splitting-example

JVTupinamba/Kennard-Stone-Mahalanobis

UERJ-LIVIA/Kennard-Stone-Mahalanobis

yuerongz/DUPLEX-data-split-function

jesschannn/datasci_9_data_prep

krisssix/BankruptcyPrediction

Lefteris-Souflas/Propensity-To-Lapse-Model-Building-Exercise

Lefteris-Souflas/Spark-Movies-Analytics

MuhireIghor/health-pro-backend

NabilahSharfina/Ruangguru-Bootcamp

raghav-arora-1998/Tradtional-Regression-and-Model-Selection