data-splitting

There are 22 repositories under data-splitting topic.

  • shenwanxiang/ChemBench

    MoleculeNet benchmark dataset & MolMapNet dataset

    Language:HTML613618
  • sharejing/Takin

    A Python toolkit for file processing, text cleaning and data splitting. 文件处理,文本清洗和数据划分的python工具包。

    Language:Python29206
  • omardbaa/Data-Splitter

    Data-Splitter is a Python script designed to split a large CSV file containing data into three different formats: JSON, a database table, and another CSV file. The script ensures a random distribution of data across the three output formats based on custom-defined ratios.

    Language:Jupyter Notebook12101
  • aarryasutar/Credit_EDA

    This project focuses on cleaning and analyzing a loan application dataset to gain insights into the factors influencing loan defaults. Through systematic data cleaning, visualization, and merging with previous application data, it provides a robust foundation for further predictive modeling.

    Language:Jupyter Notebook110
  • Aravinda89/split_train_eval_test

    splitting image dataset into train, val, test sets

    Language:Python1101
  • Dimas263/Preprocessing-Data-into-Train-Test-Val-Data

    Python Preprocessing for Sales Project Notebook

    Language:Jupyter Notebook1101
  • katiebristol/data_splitter

    A basic Python script to split a .dat file into individual sample files.

    Language:Python1100
  • MadhuBala11/DiabetesPrediction

    In this project, I have used logistic regression, a supervised machine learning algorithm, to predict whether a person has diabetes or not based on various features such as age, blood pressure, glucose level, body mass index, etc. I have used Python and popular libraries such as Pandas, Scikit-Learn, and Matplotlib to perfom model building

    Language:Jupyter Notebook110
  • Nicolas-Bolouri/Cloud-Data-Protection-Analysis

    Comparative Analysis of Data Protection Mechanisms in Public Clouds

  • SahilSunda/ML_Project

    ML model for Crop Detection

    Language:Python1001
  • szcf-weiya/SplitClusterTest.jl

    Julia package for "FDR Control via Data Splitting for Testing-after-Clustering (arXiv: 2410.06451)"

    Language:Julia10
  • intersog-developer/nn-data-splitting-example

    Source code for the article "Accelerating neural network training by data splitting".

    Language:Rust0100
  • JVTupinamba/Kennard-Stone-Mahalanobis

    As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.

    Language:Jupyter Notebook0100
  • UERJ-LIVIA/Kennard-Stone-Mahalanobis

    As Tensorflow Kennard-Stone algorithmin uses euclidean distances, the need for an adaptation arrises when dealing with a big vector space that has unknown correlations between its variables, it may improve a lot neural networks performance.

    Language:Jupyter Notebook0000
  • yuerongz/DUPLEX-data-split-function

    Apply DUPLEX data split to the given dataset and return training and test datasets. REF: Snee, R. D. (1977). Validation of regression models: methods and examples. Technometrics, 19(4), 415-428.

    Language:Python0000
  • jesschannn/datasci_9_data_prep

    Focus on selecting datasets suitable for a machine learning experiment, with an emphasis on data cleaning, encoding, and transformation steps necessary to prepare the data.

    Language:Python10
  • krisssix/BankruptcyPrediction

    Predicting company bankruptcy using various machine learning models. The dataset is sourced from Kaggle: Company Bankruptcy Prediction.

    Language:Python10
  • Lefteris-Souflas/Propensity-To-Lapse-Model-Building-Exercise

    Analyzed customer churn using transaction data. Built ML model to predict lapses. Dataset includes customer status, collection/redemption info, and program tenure. Delivered business presentation outlining modeling approach, findings, and churn reduction strategies.

  • Lefteris-Souflas/Spark-Movies-Analytics

    Utilizing Apache Spark & PySpark to analyze a movie dataset. Tasks include data exploration, identifying top-rated movies, training a linear regression model, and experimenting with Airflow.

    Language:Jupyter Notebook10
  • MuhireIghor/health-pro-backend

    A sample model for predicting the systolic level of an individual by providing the age,cholesterol and blood pressure

    Language:JavaScript10
  • NabilahSharfina/Ruangguru-Bootcamp

    Final project program DBA mitra Ruangguru X Studi Independen Bersertifikat Kampus Merdeka batch 2

    Language:Jupyter Notebook12