/Gale

Generates high quality articles based on research papers.

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

project-logo

GALE

Unleashing Insights with Precision, Creativity, and Scale.

Developed with the software and tools below.


Table of Contents

Overview

Gale is a sophisticated software project designed to enhance content creation and analysis through a series of automated agents. It employs an integrated approach to generate, format, summarize, and critique content, offering seamless interaction with large language models. The project streamlines the creation of structured outlines and detailed summaries, particularly for academic and open-source communities, ensuring high relevance and precision. Gales architecture, managed under strict guidelines and configured through a centralized system, is tailored to deliver consistent and high-quality outputs, positioning itself as a crucial tool for researchers and developers aiming to optimize their content development processes.


Features

Feature Description
βš™οΈ Architecture Gale utilizes a modular architecture with separate agents for content extraction, summarization, and critique, linked through a graph-based workflow.
πŸ”© Code Quality The code is structured into distinct agents with clear responsibilities, enhancing readability and maintainability. Utilizes Python extensively.
πŸ“„ Documentation Extensive inline comments and dedicated YAML and TOML files for configuration. Lacks comprehensive external documentation.
πŸ”Œ Integrations Integrates with external APIs and leverages the LangChain library for language model operations, supporting both local and cloud-based models.
🧩 Modularity High modularity with clear separation into agents and utilities, supporting scalable development and potential reuse in similar projects.
πŸ§ͺ Testing No explicit mention of testing frameworks or tools, suggesting an area for improvement in reliability and maintainability.
⚑️ Performance The use of configurable language models and dynamic content handling suggests good performance adaptability, but no specific metrics provided.
πŸ›‘οΈ Security Uses environment-specific configuration for API keys and parameters, implying some level of security for operational data.
πŸ“¦ Dependencies Dependencies include py, python-dotenv, ipykernel, langgraph, yaml, python, langchain-openai, toml, lock, langchain, ipywidgets.
πŸš€ Scalability Designed with scalability in mind, utilizing a graph-based flow and modular agents that can independently scale based on demand.

---

##  Repository Structure

```sh
└── gale/
    β”œβ”€β”€ CODEOWNERS
    β”œβ”€β”€ LICENSE
    β”œβ”€β”€ Readme.md
    β”œβ”€β”€ config.yaml
    β”œβ”€β”€ poetry.lock
    β”œβ”€β”€ pyproject.toml
    β”œβ”€β”€ src
    β”‚   β”œβ”€β”€ agents
    β”‚   β”‚   β”œβ”€β”€ content_extractor
    β”‚   β”‚   β”‚   └── extractor.py
    β”‚   β”‚   β”œβ”€β”€ criticizer
    β”‚   β”‚   β”‚   └── critic.py
    β”‚   β”‚   β”œβ”€β”€ format_checker
    β”‚   β”‚   β”‚   └── formatter.py
    β”‚   β”‚   β”œβ”€β”€ initial_summarizer
    β”‚   β”‚   β”‚   └── initial_summarizer.py
    β”‚   β”‚   └── outline_generator
    β”‚   β”‚       └── outline_gen.py
    β”‚   β”œβ”€β”€ graphs
    β”‚   β”‚   └── gale_graph.py
    β”‚   └── utils
    β”‚       └── togetherchain.py
    └── tests
        └── test1.py

Modules

.
File Summary
CODEOWNERS Assigns repository maintenance responsibilities to a specific user, @bhaswata08, ensuring accountability and streamlined management of changes and approvals within the project.
config.yaml Config.yaml centralizes configuration for various components of the Gale repository, defining interaction parameters with multiple large language models, including token limits and model specifications, ensuring consistent behavior across content extraction, formatting, summarizing, and more in the softwares architecture.
pyproject.toml Defines the gale project’s metadata, dependencies, and development environment using Poetry, ensuring compatibility and manageability of its article generation capabilities. It specifies essential libraries and tools, aligning with the broader architecture aimed at high-quality content creation.
src.agents.content_extractor
File Summary
extractor.py Empowers the content extraction process by leveraging criticism to selectively retrieve and highlight pertinent information from provided context, ensuring outputs are optimally aligned with specific critique parameters for enhanced relevance and accuracy. Utilizes configurable language models to drive precision in information extraction tasks.
src.agents.criticizer
File Summary
critic.py Critique.py operationalizes feedback mechanisms by analyzing reviews with AI-driven critiques, ensuring comprehensive evaluation based on predefined templates and configurable parameters. It integrates with external APIs, tailoring responses to enhance content review processes within the repositorys architecture.
src.agents.format_checker
File Summary
formatter.py Format Checker ensures output from language models adheres strictly to predefined formats, enhancing consistency and reducing extraneous content by leveraging configuration settings and environment-specific parameters to dynamically select and apply formatting rules.
src.agents.initial_summarizer
File Summary
initial_summarizer.py The initial_summarizer.py file is a vital component of the gale repository, serving as the entry point for the initial summarizer agent. It generates high-quality summaries from academic papers or open-source projects, focusing on key points, significant contributions, findings, and technical approaches. The file configures a prompt template and language model, choosing between local or cloud-based models based on the configuration settings.
src.agents.outline_generator
File Summary
outline_gen.py The outline_gen.py file drives the outline generator agent in the gale repository, creating runnables for Wikipedia page outlines. It defines classes for Wikipedia page sections, subsections, and outlines. The module imports necessary components from langchain_openai, langchain_core, and langchain libraries, and sets up a prompt template with a system message, a user message, and a language model (LLM) for generating outlines from given content. The module also includes feedback logic to help the agent improve its performance.
src.graphs
File Summary
gale_graph.py Initial_summarizer, outline_gen, criticizer, extract_content, and END. The graph begins with text input, then iteratively summarizes, generates outlines, critiques, and extracts content until an ideal outline is achieved or a limit is reached, returning the final outline.
src.utils
File Summary
togetherchain.py TogetherLLM class in togetherchain.py integrates ChatTogetherAI with LangChain, extending the LLM framework to enhance interaction capabilities through API key management and dynamic response generation based on user inputs. This integration supports both synchronous and asynchronous operations.

Getting Started

System Requirements:

  • Python: version x.y.z

Installation

From source

  1. Clone the gale repository:
$ git clone https://github.com/bhaswata08/gale
  1. Change to the project directory:
$ cd gale
  1. Install the dependencies:
$ pip install -r requirements.txt

Usage

From source

Run gale using the command below:

$ python main.py

Tests

Run the test suite using the command below:

$ pytest

Project Roadmap

  • β–Ί Initial working prototype
  • β–Ί Get rid of ai generated readme for a better readme
  • β–ΊAdd support for multiple LLMs.
  • β–ΊAdd frontend and main.py script.
  • β–ΊAdd batching for each section of document.
  • β–ΊAdd PDF support.
  • β–ΊAdd diagram support.
  • β–ΊAdd auto generation of a report(Markdown/pdf).

Contributing

Contributions are welcome! Here are several ways you can contribute:

Contributing Guidelines
  1. Fork the Repository: Start by forking the project repository to your github account.
  2. Clone Locally: Clone the forked repository to your local machine using a git client.
    git clone https://github.com/bhaswata08/gale
  3. Create a New Branch: Always work on a new branch, giving it a descriptive name.
    git checkout -b new-feature-x
  4. Make Your Changes: Develop and test your changes locally.
  5. Commit Your Changes: Commit with a clear message describing your updates.
    git commit -m 'Implemented new feature x.'
  6. Push to github: Push the changes to your forked repository.
    git push origin new-feature-x
  7. Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
  8. Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!
Contributor Graph


License

This project is protected under the GNU Affero General Public License v3.0 License. For more details, refer to the LICENSE file.

Acknowledgments

  • List any resources, contributors, inspiration, etc. here.

Return