RawToPrompt

RawToPrompt 📜➡️❓

RawToPrompt is a project aimed at converting raw, unstructured text data into structured prompt-completion pairs suitable for fine-tuning Large Language Models (LLMs). It intelligently ingests raw .txt files, processes the content, and generates contextual prompts and completions based on the content using state-of-the-art NLP techniques.

Project Objective 🎯

To create a data pipeline:

Ingestion: Accept raw .txt files.
Processing: Clean and preprocess the text data.
Interactive Q/A: Optionally ask the user for specific prompts or topics.
Prompt Completion Generation: Generate AI-driven contextual prompts and completions.

Setup and Installation ⚙️

Clone the repository:

git clone <repository-link>
cd RawToPrompt

Install the required libraries:
```
pip install -r requirements.txt
```

Usage 🚀

Place your .txt files in the project root or any directory of your choice.
Run the main.py:
```
python main.py
```
Follow the on-screen instructions. You can provide your own prompts or allow the system to generate them contextually.

Project Structure 🌳

├───data_ingestion
│   └─── Module responsible for ingesting raw .txt files
├───data_processing
│   └─── Module for preprocessing and cleaning the text
├───interactive_qa
│   └─── Handles user interactions for prompt specifications
└───prompt_completion
    └─── Generates prompt-completion pairs using AI-driven techniques

Contributing 🤝

If you'd like to contribute to RawToPrompt, please fork the repository and use a feature branch. Pull requests are warmly welcome!

License 📄

This project is licensed under the MIT License - see the LICENSE.md file for details.

saurabh98s/JSONLFineTunePrep