SciPhi is a Python framework that enables the generation of high-quality synthetic data for LLM and/or human consumption. Key features include:
- Configurable Data Generation: Produce LLM-mediated datasets tailored to your needs.
- The Library of Phi: An AI-powered initiative that creates open-source textbooks.
- Join our Discord community for discussions and collaboration.
- For specialized inquiries, email us.
An initiative to democratize access to high-quality textbooks by employing AI techniques to craft factually accurate books.
-
Dry Run:
python sciphi/examples/library_of_phi/generate_textbook.py dry_run
-
Default Textbook Generation:
python sciphi/examples/library_of_phi/generate_textbook.py run --llm-provider=openai --llm_model_name=gpt-3.5-turbo --do-rag=False --textbook=Aerodynamics_of_Viscous_Fluids --filter_existing_books=False --log-level=debug
-
Using Custom Table of Contents: Draft and save as
textbook_name.yaml
, then place it in the specified directory. -
Incorporating RAG: Enable the flag and set the appropriate
.env
variables.
Note: Ensure alignment with our specifications if using Wikipedia for RAG. Explore more examples here.
Execute runner.py
with various command-line arguments for customized data generation.
python sciphi/examples/basic_data_gen/runner.py --provider_name=openai --model_name=gpt-4 --log_level=INFO --batch_size=1 --num_samples=1 --output_file_name=example_output.jsonl --example_config=textbooks_are_all_you_need_basic_split
Generates a single sample from GPT-4 using specified configurations.
Refer to the README for a comprehensive list of arguments and their defaults. Noteworthy ones include --provider
, --model_name
, and --temperature
.
-
Clone and navigate to the repository:
git clone https://github.com/emrgnt-cmplxty/sciphi.git cd sciphi
-
Install dependencies:
pip install -r requirements.txt
-
Set up your environment:
cp .env.example .env
- Python: 3.11 - 3.12
Install enhanced features using pip install <package_name>
.
Apache-2.0 License.
If using SciPhi in your research, please cite:
@software{SciPhi,
author = {Colegrove, Owen},
doi = {Pending},
month = {09},
title = {{SciPhi}},
url = {https://github.com/emrgnt-cmplxty/sciphi},
year = {2023}
}
Note: This version assumes you have a requirements.txt
file that lists all the necessary dependencies for pip
to install. If such a file doesn't exist, you'll need to create one.