/wiz-craft

A CLI-based dataset preprocessing tool for machine learning tasks. Features include data exploration, null value handling, one-hot encoding, and feature scaling, and download the modified dataset effortlessly.

Primary LanguagePythonMIT LicenseMIT

wizcraft-banner

WizCraft - CLI tool that simplifies the process of data pre-processing | Product Hunt

License

Downloads

PyPI - Version

WizCraft - CLI-Based Dataset Preprocessing Tool

WizCraft is a cutting-edge Command Line Interface (CLI) tool developed to simplify the process of dataset preprocessing for machine learning tasks. It aims to provide a seamless and efficient experience for data scientists of all levels, facilitating the preparation of data for various machine-learning applications.

Try the tool online here

Check out the Contribution Guide if you want to Contribute to this project

Table of Contents

Features

  • Load and preprocess your dataset effortlessly through a Command Line Interface (CLI).
  • View dataset statistics, null value counts, and perform data imputation.
  • Encode categorical variables using one-hot encoding.
  • Normalize and standardize numerical features for better model performance.
  • Download the preprocessed dataset with your desired modifications.

Getting Started

Installation

  1. Run the pip command:
    pip install wiz-craft
    
  2. To use the module, use the commands:
    from wizcraft.preprocess import Preprocess
    wiz_obj = Preprocess()
    wiz_obj.start()  
  3. Follow the on-screen prompts to load your dataset, select target variables, and perform preprocessing tasks.

wizcraft-cli_welcome

Features Available

Data Description

data_description_preview

  1. View statistics and properties of numeric columns.
  2. Explore unique values and statistics of categorical columns.
  3. Display a snapshot of the dataset.

Handle Null Values

null_data_preview

  1. Show NULL value counts in each column.
  2. Remove specific columns or fill NULL values with mean, median, or mode.

Encode Categorical Values

one_hot_encode_preview

  1. Identify and list categorical columns.
  2. Perform one-hot encoding on categorical columns.

Feature Scaling

scaling_preview

  1. Normalize (Min-Max scaling) or standardize (Standard Scaler) numerical columns.

Save Preprocessed Dataset

save_preview

  1. Download the modified dataset with applied preprocessing steps.

Future Works

  • Advanced Data Imputation Techniques: Adding support for advanced data imputation techniques, such as K-nearest neighbours (KNN) imputation.

  • Improved UI and UX using Rich

  • Undo/Redo Option for each step

  • Extension for NLP tasks (like tokenization, stemming)

  • User-Friendly Interface: Improving the user interface to provide more interactive and user-friendly features, such as progress bars, error handling, and clear instructions.

  • Using Curses for terminal Manipulation.

Contributing to the Project

Check out the Contribution Guide if you want to contribute to this project